Pronominal object clitics in preverbal position are a hard nut to crack for Google Translate

Some Romance languages allow the movement of pronominal object clitics to the preverbal position (Hanson & Carlson, 2014; Labotka et al., 2023). That is, instead of saying La maestra lo ha detto (Italian) ‘The teacher has said it’, it is possible to say Lo ha detto la maestra ‘It has said the teacher’. The latter is a marked phrasing that increases the attention to the subject of the sentence. Furthermore, when the clitic is in preverbal position, the degree of focus on the subject is also dependent on the context. For instance, the focus is light in Lo ha detto la maestra, whereas it is stronger in Lo ha detto la maestra, non l’assistente “It’s the teacher that’s said it, not the assistant”. Whereas the light case can safely lose the subject-focus in the translation (‘The teacher has said it’), the marked cases would require either a marked pronunciation in speech, or italics in writing (‘The teacher has said it’), or the idiomatic form ‘The teacher said’, or a marked syntax—for instance, with an it-cleft, as in “It’s the teacher that’s said it, not the assistant”, or by adding ‘oneself’, as in ‘The teacher said it herself’. Now, how does Google Translate (GT) deal with these translations in May 2023? For this specific case, GT opts for ‘The teacher said’, which is a good, idiomatic option. When it comes to translating to Spanish, GT returns ‘El profesor dijo’, which is the direct equivalent of the English translation. This option is valid in some varieties of Spanish in America. Nonetheless, it must be noted that a more direct translation from the Italian form would have been very good.

GT has greater trouble when the content of the sentence is slightly less frequent. For instance, Lo cerca la maestra ‘Him is seeking the teacher’ is stripped of its markedness in GT’s rendering in English—i.e., ‘The teacher is looking for him’. Preserving the subject-focus—e.g., “It’s the teacher that’s looking for him”—would require some syntactic liberties, and hence entail some risks. So, playing it safe is understandable. Our next step is checking the translation to some Romance languages that allow the same movement to preverbal position present in the original Italian sentence Lo cerca la maestra. Aside from GT, the sentence could be well translated into Romanian as Îl caută dăscăliţa, or into Spanish as Lo busca la profesora. In contrast, for both translations, GT returns the equivalents of the English translation—i.e., Profesorul îl caută and El profesor lo busca, again discarding the focus on the subject—unnecessarily in these cases, due to the overlap in the grammars.1

Suggesting a better translation in Google Translate

Submitting a better translation is always an option.

In fairness, machine translation is an absolute feat overall, provided enough caution is practised. With the expansion of language models, machine translation is only going to improve. So, how much of a piece of cake will it be for GT to crack some of these syntactic details in time, and to preserve syntactic forms across languages when the systems match?

References

Hanson, A. E. S., & Carlson, M. T. (2014). The roles of first language and proficiency in L2 processing of Spanish clitics: Global effects. Language Learning, 64(2), 310-342. https://doi-org.mime.uit.no/10.1111/lang.12050

Labotka, D., Sabo, E., Bonais, R., Gelman, S. A., & Baptista, M. (2023). Testing the effects of congruence in adult multilingual acquisition with implications for creole genesis. Cognition, 235, 105387. https://doi-org.mime.uit.no/10.1016/j.cognition.2023.105387


  1. There are also errors of gender in the translations of la maestra.↩︎

comments powered by Disqus