Contributions made on the dedicated website are already digital. They are entrusted to the survey company OpinionWay and its partner in artificial intelligence, Qwam. Digital contributions include, on the one hand, answers to multiple-choice questions, which are fairly easy to analyze, and, on the other hand, texts that are processed by extraction engines, which extract words and concepts based on a predefined term repository. These are then classified into several categories, such as "taxes", "hospitals", "blank vote", etc.
What risks can such processes involve?
Programs that recognize and analyze text are not new and generally work quite well. Nevertheless, they do involve a risk of confusion and misunderstanding. For example, when two words look very similar, such as the French words "fracture" (i.e. "divide") or "facture" (i.e. "invoice"), which one should be recognized on the register for complaints? Should we classify nuanced text such as "I rather agree, but actually" as being "for" or "against" the carbon tax? And how can we ensure that the AI system takes emotions, anger or irony into account?
Moreover, any classification involves choices: how many categories should there be, what should count as keywords? For example, should the terms "family allowance" and "family quotient" (the number of family units for the calculation of the income tax) be in the same category? Is the "public service" different from "public services"? Should a category be created for the tax credit for home support services, or should it be included in the "tax exemption" category?
Are there any issues of representativeness, and if so, how should we deal with them?
There are indeed several issues at stake. First, as in any such public and open process, some may try to push for an idea by leading very active communication campaigns through social networks, Whatsapp channels or group emails, so that many people publish the same text defending a specific idea. For example, it is quite obvious that a group has mobilized to abolish the 80 km/h speed limit on departmental roads: the very same text focusing on this topic has been shared by thousands of contributors. Fortunately, artificial intelligence can identify identical texts very easily (although it is unable to identify different texts defending the same idea), and can thus shed light on such strategies of influence. Yet we still need to decide what should be done when confronted to such situations.
Then there is a second, deeper issue of representativeness. These contributions have been made by about 1.5 million people. That's a lot of course, but it's only a fraction of the overall French population. Concerning the contributions made on the website, we do not have data to classify their authors by age, gender, level of resources... The only information requested to make a contribution is the postal code. So one of the things we can observe is that Paris and the big cities are a little more represented than others.