Premier modèle d’IA générative en dialectes d’Algérie – Le Jeune Indépendant


A press conference was organized for the official launch of “Hadretna”, an online translation algorithm for the different dialects spoken in Algeria, developed by the start-up Fentech, operating under the name Tamatech, in partnership with the eminent scientist and researcher Professor Merouane Debbah.

The professor scientific director of “Hadretna” began his speech by declaring he was delighted to be able to present “the first model of generative artificial intelligence in Algerian dialects, to promote our way of speaking from east to west and from the south, including the north of the country”.

The objective of the “Hadretna” database, which means “our speech” in Daridja, aims to contribute to the preservation of linguistic and cultural diversity in the country, while promoting the development of services accessible to the 45 million people. Algerians, added Professor Debbah, during this conference organized under the aegis of the Ministry of the Knowledge Economy, Start-ups and Microenterprises, Tuesday, at the Marriott Hotel (Algiers).

The scientific director also indicated that this online translation algorithm represents “the fruit of six months of work with engineers from the start-up Fentech, with which I am associated”. He specified that this work “is, at this stage, powered by two Giga Tokens of data collected online in three alphabets, “Arabic, Latin, Tifinagh”, stressing that this algorithm, presented today, is the first essential step towards the implementation of a generative artificial intelligence model.

Regarding the objective of this artificial intelligence, the scientific researcher indicated that it is part of a perspective of creating a linguistic bridge between the different dialects spoken in Algeria, thus allowing all its inhabitants to access any information in their maternal dialect. “Let us hope that this – this artificial intelligence – will contribute to the digital inclusion of all Algerians,” he said. To be efficient, an “Algerian Large Language Model (LLM) using the dialects widespread in Algeria requires training on a large quantity of textual resources”, according to the professor, who specifies that English today represents 45% of the content available on the internet. The most well-known LLMs (Chat GPT from Open AI, Lama from Meta, Gemini from Google) were therefore designed, for the most part, in this language, “so it does not work in our language”.

Indeed, building an LLM on dialects with limited written documentation represents a major technical challenge. LLM, which is the equivalent of Chat GPT, is a language model with a large number of parameters. This computer program has received enough databases, including written ones, to be able to generate original content of texts, images and videos.

In order to achieve this objective, Hadretna teams are working to construct the largest corpus of texts in the different dialects in circulation in Algeria. “Our ambition is to build the first LLM in Algerian dialects,” said Moussab Djerrab, scientific director of Fentech. “To achieve this objective, we launched the website www.hadretna.ai. Anyone who wishes can participate in the project by providing Hadretna with their translations and annotations.”

Once created, this new database will make it possible to train version 2 of Hadretna. The AI ​​models developed on this data will be made open-source on a future platform. Founded by three brothers in 2018, Fentech is a start-up that develops an artificial intelligence platform allowing companies, all sectors combined, to optimize their decisions in real time: optimize energy consumption, predict consumption of a product, set prices, manage stocks). It has offices in Algiers and Paris.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enable JavaScript in your browser to complete this form.
Address
Enable Notifications OK No thanks