Tech's Hidden Discrimination is also a Matter of Language

Powerful tech companies are criticised for being worryingly ill-equipped to deal with hate speech, misinformation and illegal content in languages other than English. This is a language gap that has proven to have serious consequences.

Danish is a minority language or, in the view of tech companies, a “low-resource” language. This means that there is not as much data compared to languages spoken by a larger proportion of the world’s internet users. Several researchers point to this problem that has been coined “the language gap”. Even within the walls of the US companies behind the platforms, there are employees who have long pointed to the problem. Meta’s chief technology officer, Mike Schroepfer, admitted at a 2018 conference in California that the systems build by the company aren’t good enough when it comes to minority languages .

France and Germany are some of the countries that have legislated to combat the spread of illegal and harmful content via YouTube, among others. Germany passed a law in 2018 obliging social media sites to remove illegal content within 24 hours of receiving a written request from a user. If companies fail to comply with the law, they can face hefty fines. France has adopted similar legislation. As a result of the legislation, YouTube has subsequently hired more people to supervise its algorithms in identifying and removing extreme content and started work on translating it into German and French, as YouTube’s former CEO Susan Wojcicki pointed out to New York Times journalist Kevin Roose in 2020. The changes also came in the wake of revelations by the BBC and The Times that YouTube’s system for ensuring that child abuse does not occur in comment tracks was not working optimally and had not been in place for over a year.

Language inequality is one of the tech giants’ biggest problem
On 16 May, US whistleblower Frances Haugen visited Denmark for the LIBRE award ceremony. She was asked what she would highlight as Meta’s (former Facebook) most pressing problem right now. She mentioned “language inequality” and pointed to language as a grey area and a serious problem. “Which country is getting the help it needs? If we want to create equality in terms of language, i.e. ensure that you as a user are protected regardless of what language you speak, then we need to take Facebook’s own ‘product safety information’ and apply it to all languages.” This is not the case at present, which is why users living in non-priority countries have proven to be particularly vulnerable. This also applies to Denmark.

According to Haugen, only in a crisis situation does attention turn to translating a – for a company like Facebook – “new language”. But the question is, how and when is a situation assessed as a crisis? As an example, it may be surprising that Facebook and Instagram first chose to take action against Ethiopia in 2021, when the company embarked on expanding its security system based on “risk of offline violence.” That announcement came at a time when the country had been in civil war for nearly a year.

The internal documents collected by Francis Haugen form the basis of The Wall Street Journal’s investigation into The Facebook Files. It reveals, among other things, that there is a big difference in how and to what extent harmful and dangerous content and illegal use of the platform is identified. Problems arise in countries where the company has too few or no employees who understand the language. The internal papers also show that Facebook employees have long pointed out the seriousness.

In 2021 Meta announced that the previous year it had successfully removed 97% of hate speech content on the Facebook platform via an automated censorship system. It sounded amazing – and perhaps a little too amazing. Indeed, it later emerged via internal documents examined by Wall Street Journal reporters that the figures were more like 3-5%. And even lower, depending on the country of residence and language of the user. According to the company itself, there are 1.84 billion daily active users, and 90% reside outside North America and Canada.

The majority of Facebook’s users therefore live outside the United States. There are more than 8000 languages in the world, so choosing a language like English, which is now spoken in large parts of the world, is an understandable priority. But that’s not the point here. Part of the problem arises when the systems – both automated and human monitoring and oversight – discriminate against platform users.

Social media language gap – serious translation problems
There is a need to end once and for all the discrimination that leaves minority language users both exposed and defenseless. At an online conference last May discussed the human rights responsibilities of private companies. In this context, a research center at Irvine University published the article “Lost in Translation: How the Facebook Oversight Board’s Limited Language Capabilities Undermine Human Rights.” In the article, Alice Doyle, J.D. goes so far as to say that human rights are at risk of being undermined. She reviews a number of recommendations to address the language gap and a critique of the existing model companies like Meta use for oversight and other purposes. For example, how do companies address the limited language capabilities that hinder the ability to understand and assess the full cultural context of the content being reviewed. This is both a complex and structural problem, which therefore also needs to be addressed structurally.

Regardless of where and how responsibilities are placed in terms of regulation and moderation, there is a systematic discrimination that cannot be ignored. There is a huge amount of work to be done in examining the knock-on effects of the language gap and the consequences that tech companies have caused in launching the digital revolution with “Move fast and break things” . And right now we may be afraid of it all happening again.

Instead of stopping and giving space to all those and what broke along the way, the industry is running on in a manic hype towards a new virtual space, repeating all the mistakes of the new tech craze: the metaverse.

With all the billions of dollars tech companies have to develop new commercial virtual spaces you’d think there was enough money in the coffers to fix the flaws? To guarantee a decent quality of analysis and ensure equality across cultural and linguistic divides.

Photo: Hannah Wright, unsplash.com

Tech’s Hidden Discrimination is also a Matter of Language

About Data Ethics

Contact us

You may also like

About Data Ethics

Contact us

Subscribe to our newsletter