Scientists from the Far East will teach a neural network the Russian language

The School of Digital Economy at Far Eastern Federal University plans to begin work on creating a digital corpus of the Russian language. It is needed for training neural networks and developing a synthetic personality based on artificial intelligence.

The developers say that similar corpora exist for French and English. The basis of the activity is to collect the audio corpus and then place it in a way that will be accessible to the machine.

Linguists, volunteers from FEFU and experts in computer linguistics will take part in the development of the project. They will mark up the audio material: splitting and pausing, accents, dividing into dialogues/monologues and so on.

One of the developers of the project explains the desirability of this project by the need to develop languages. As unwritten languages have gradually died out and only those with writing have survived, it is likely that languages that machines (printers, microwaves, machines) will not know and cannot speak are also in danger of becoming extinct due to the rapid development of technology. This is why we need to digitise language and translate it into a model that will train a neural network. Software translations are becoming an extremely important task these days.