Our Polish language – the one that seems to be an insurmountable barrier for many people – turned out to be… the easiest to understand for artificial intelligence. According to a study conducted by the University of Maryland and Microsoft, AI models respond most accurately to commands in Polish. English, usually considered (though incorrectly) to be the native language of machines, only landed in sixth place.
Researchers from the University of Maryland and Microsoft checked how well leading language models understood commands in 26 world languages. The result surprised even their creators: it’s not English, but Polish turned out to be the most “readable” for AI.
Polish achieved an average efficiency of 88% in test tasks, which included: text generation, interpretation of long commands and logical reasoning. For comparison, English scored 83.9% and Chinese was only 4th from the bottom. That’s saying something. It’s hard not to smile: humans can get tripped up by inflections and exceptions, but the AI handles them better than plain English. Maybe because the Polish language is incredibly precise, although it operates at a level of complexity unattainable for users of many more popular languages in the world?
Why Polish?
From a scientific point of view, Polish has something that many languages lack: high structural precision. Inflection by cases, genders and numbers – all this makes the meaning of sentences extremely unambiguous. For a person learning a language it is torture, but for a model learning statistical dependencies – absolute gold.
And this is how AI can better understand what the user wants to achieve, because each word provides information about the relationships in the sentence. In English, for example, the meaning often depends on the context or the order of the words. In Polish – it is written in the word form itself. It’s a bit like the difference between a sketch and a technical plan: the former gives a general picture, the latter – instructions without any margin for error.
Data doesn’t play the main role
Yet Polish does not have a quantitative advantage in terms of the amount of data on the Internet in our language. English dominates, Chinese has billions of users, and yet the models performed worse in their use. This means that the amount of training data does not always translate into the quality of reasoning.
Researchers suggest that the structure of the language, not its popularity, may be key. Models trained on multiple languages can draw conclusions from more complex grammatical systems – such as Polish – and then better understand complex commands regardless of the language. Perhaps languages previously considered “difficult” will become the basis for the development of future models?
What does the ranking look like?
The ten best understood languages include:
-
Polish – 88%
-
French – 87%
-
Italian – 86%
-
Spanish – 85%
-
Russian – 84%
-
English – 83.9%
-
Ukrainian – 83.5%
-
Portuguese – 82%
-
German – 81%
-
Dutch – 80%
Some results seem downright… absurdly perverse. After all, Chinese has one of the largest databases of texts on the Internet and ranks fourth from the bottom. Simply “stuffing” models with huge amounts of data is not enough – what is important is how this data is organized and how the language allows the machine to understand the relationships between words.
What does this mean for the future of AI?
The conclusions of the study are simple, but also surprising. Since Polish works better as a command language, may become the preferred language for interacting with AI in certain applications – from education to content creation and systems management.
Read more: Artificial intelligence has gone crazy. It was enough to show her this
This is also good news for Polish users. You can include in your list of myths the fact that AI “understands English/Chinese/any other language better.” This one speaks Polish very well. This means: we can be seriously proud of our language. Certainly not easy, but beautiful and – most importantly – precise.
