Molly Crabapple, a U.S.-based author and artist, reportedly woke up one day in October to find that her Instagram account had been severely restricted. “Your account can’t be shown to non-followers,” read a message when she opened the app.
Instagram provided no more of an explanation for the restriction, which was mystifying. She guessed it might have been triggered when she reposted a story by Democracy Now! on the state-sanctioned militarization of Israel’s settlers in the West Bank.
In a similar case, on October 12, Instagram suspended the account of Motaz Azaiza, a twenty-four-year-old Gaza-based journalist, who had documented devastating scenes of mass destruction and death. After an uproar from his followers, the account was restored. Meta, Instagram’s parent company, provided no explanation for the suspension.
These are just two of many similar stories circulating on social media since the beginning of the war in Gaza last October. While Instagram does not release data, whether granular or general, on the number and type of restrictions it places on accounts, since October such restrictions have seemed to disproportionately affect Arabic-speaking users and public figures, including journalists engaged in war-related speech. Many complained they were “shadowbanned,” meaning that their accounts and posts were made less visible to other users and followers (shadowbanning is notoriously difficult to prove). Some said that certain hashtags, such as #IStandWithPalestine, were made unsearchable. Others shared content takedown notices that provided no explanation or redress mechanism. Human rights and research groups called on tech companies to address what they said was rampant anti-Palestinian racism on their platforms and discriminatory content moderation against Palestinians and their allies.
For many who follow how content moderation tools and their politics have evolved, these apparent trends in social media restrictions were hardly a surprise. Since social media companies began to crack down on online extremism in the last decade, counter-extremism policies that were, in theory, designed to prevent violent and inciting speech have repeatedly unfairly discriminated against Arabic speakers, Palestinians, and their allies. These flaws are the result of biases in technical systems, the imperfect tools of content moderation, and the anti-Arab bias of the counterterrorism regimes.
Regulatory authorities in the United States and Europe have little immediate recourse, because existing laws don’t empower them to correct biases in content moderation. In fact, some of the existing laws, like the EU’s new Digital Services Act, could enable biased moderation.
The result is that social media companies are on the verge of a deep loss of user trust because their automated moderation systems continue to discriminate despite repeated pledges to self-improve. To be good free speech actors and to retain their credibility, these companies must muster the will to withstand the political price of balancing rights protections and harm prevention—especially when the rights in question are those of historically marginalized communities.
The War on Terror and Platform Policies
Over the last few years, platforms have tightened their systems against “terrorism” and “violent extremism” in response to heightened pressure to crackdown on extremist groups that skillfully used these platforms to recruit and spread their propaganda. The rise of the Islamic State and the March 2019 mosque attack in Christchurch, New Zealand are usually cited as two events that catalyzed and paved the way for the systems currently in place.
Moderating extremist content across platforms crystallized around two basic policy approaches, targeting either categories of speakers or specific speech terms. Targeting speakers relies on lists of entities that are considered dangerous and whose presence on platforms is not allowed. Such targeting is supposed to prevent accounts owned or managed, or perceived to be owned or managed, by actors on these lists. Such lists, which platforms have never published (although some have been leaked), usually rely heavily on terrorism designations made by the United States, the European Union, or the UN. The second approach targets specific speech terms, or combinations of them, and aims to prevent content, en masse, that is perceived to support or promote extremist entities, events, or ideologies regardless of the owners of the accounts. On Meta’s platforms Facebook and Instagram, the policy prevents “praise, substantive support, or representation.”
Counter-extremism policies that were supposed to prevent violent and inciting speech have repeatedly unfairly discriminated against Arabic speakers, Palestinians, and their allies.
These policies are the offspring of the two-decade-old War on Terror, which watered down human rights protections in the West, especially the enshrined right to due process. In particular, the War on Terror established new policing and regulatory practices that made the concept of a credible threat very broad and opaque. War on Terror policing has unfairly targeted Muslims, sweeping up innocent communities and individuals. Other marginalized communities have also suffered from these practices, while credible threats from more empowered communities have gone undetected.
One way this legacy manifests is in the different ways companies moderate far-right extremism versus Islamist extremism. Although both Islamist and far-right extremism are cited as the targets of these policies, they are not equally moderated across platforms—especially in Meta—suggesting embedded racialization in these systems. The long-secret Meta list used in its “Dangerous Individuals and Organizations” policy, which was leaked in 2021, reveals these biases. It classifies entities through a three-tier system, with each tier implying different speech restrictions and penalties for violations. Tier 1, which is subjected to the most stringent restrictions and penalties, is overwhelmingly populated by Muslims, Arabs, and South Asians. In contrast, far-right extremist entities are largely listed under the tier 3 category, which allows “praise” of groups, including neo-Nazi groups.
In theory, a tier system is a judicious incremental approach to moderate a wide range of different designated entities with each posing its own range of risks and harms. Yet, in practice, it appears to be perpetuating biases that treat extremism from “Western” groups as less dangerous than those with a connection to Islam.
Data published by Tech Against Terrorism, a UN-backed partnership between governments and companies that alerts over 150 participating tech companies to violating content, reveals the same unequal moderation patterns. According to the organization’s 2021–22 transparency report, experts alerted companies to almost 19,000 URLs containing terrorist content. About 95 percent of these alerts were for Islamist terrorist content, while only 5 percent were considered far-right content. Even if the report’s explanation is true—that this massive discrepancy is due to the difference in these groups’ “propaganda dissemination techniques”—platforms’ removal rates varied widely. Only 61 percent of far-right terrorist content was removed, compared to 84 percent of the Islamist terrorist counterpart. Although there is no mention of why the removal rates varied widely, tier systems, like Meta’s, that are more lenient with far-right extremism could very well be the explanation for the disparity. After all, removing content of an extremist U.S. politician has a much higher political cost than removing content by a Palestinian journalist.
Palestinian Journalist Mariam al-Bargouthi posted a story that read “What you cannot not be able to understand [sic] is that Palestinians want no revenge, only to avenge their displacement by return, not by becomin the violence used against them.” The post was taken down for breaking Meta’s counter-extremism policy called “Dangerous Individuals and Organizations.” Source: author’s screenshot from Instagram.
Tools for Mass Enforcement
Since social media platforms like Meta began cracking down on extremist content, extremist groups have evolved their propaganda methods. In response, platforms have developed new counterstrategies that rely on tools like mass automated enforcement. But these tools are informed by flawed War on Terror approaches. As a result, they have simply amplified the biases in content moderation.
Enforcement of content moderation policies is a tricky business, especially when you are after groups that skillfully outmaneuver every new restriction to limit their presence. In response to platforms’ crackdown on their presence, designated groups migrated to new platforms—including encrypted messaging apps, such as Telegram—and decentralized the dispersal of their content. Video- and image-hosting websites grew to be of critical importance as well. But eventually, content produced by extremist groups found its way back to social media platforms by way of well-intended users unaffiliated with these groups or through accounts that are unofficially affiliated with them.
To counter this development, initiatives that share information on extremist content across the technology sector emerged as a way to improve platforms’ readiness. One way the platforms and their collaborative initiatives seek to address the decentralized dispersal of content produced by extremist groups is a practice called “hashing” in which such content is “fingerprinted” so that automated systems can conduct preventive content moderation en masse. The Global Internet Forum to Counter Terrorism, an industry-wide collaborative institution, mainstreamed the practice of the use of hashing for automated moderation by its creation and maintenance of a “hash-sharing database” that features URLs, images, videos, PDFs, and other forms linked to extremist entities. A direct consequence of this practice is the indiscriminate removal of content that features these visuals—even in non-extremist uses, such as satire and news reporting.
This kind of collaboration is not new. It emerged as child sexual abuse material found its way to social media, creating an urgent need for an effective, preemptive cross-sector policing of such content. Such collaboration technology is very effective in preventing the distribution of certain content en masse, but it makes little distinction about the context of the distribution. As such, it should really only be used to block content when there is broad consensus that such content should not be allowed online under any circumstances. An image depicting child sexual abuse should obviously never be distributed. On the other hand, an image showing a victim of a bombing should not be used to inflame passions for an extremist cause, but it might have news value. Hashing can’t easily distinguish between these use cases. At the moment, there is no publicly available data published by platforms and these collaborative institutions on how extending this hashing practice to extremist content has affected users’ freedom of speech.
Removing content of an extremist U.S. politician has a much higher political cost than removing content by a Palestinian journalist.
Automated systems also use content filters to flag content that is only possibly violating. Certain key terms, or combinations of terms, in any piece of content trigger an algorithm-powered review process. If this automated review judges, with “high confidence,” that the content violates policy, it is removed without human review. If the review judges that the content is violating, but with less confidence, the piece of content is referred to a human reviewer. Yet, mass layoffs of safety workers last year have reportedly increased the dependence on automated systems, which social media companies have said, by definition, means increased errors.
Whether they depend on collaborative hashing or internal proprietary content filters, the design and enforcement of these automated systems have been documented to lead to over-enforcement against users from the same backgrounds that have been inadvertently affected by counterterrorism policies. While there has been no publicly available auditing of these platforms’ proprietary systems, independent studies of automated speech detection tools have shown systemic bias on racialized lines. Further, a leaked Meta memo in 2021 stated that 77 percent of the content removed under its current Arabic counterterrorism “classifiers” turned out, on review, to actually not be in violation of Meta policies. In response to these findings and to recommendations from the Oversight Board, a Meta-founded but independent body, Meta initiated a policy improvement process.
However, even as platforms recognized issues with their systems, they started to deploy new tactics that limit the outreach of accounts and certain posts in a similarly problematic way. Instead of banning a user or locking them out for a few days, platforms use shadowbanning, which makes certain accounts not searchable or “down-ranked” on feed pages that recommend content. Several Palestinians and other social media users posting pro-Palestinian content have told me that engagement with their posts has declined significantly since the start of the Gaza war, and they could only assume that their accounts had been shadowbanned. Some received notifications from Instagram stating that their “account and content won’t appear in places like Explore, Search, Suggested Users, Reels, and Feed Recommendations” because their “account activities” might not follow policies, with no explanation of what content or policies triggered this.
Other cases of apparent discrimination against Palestinian speech are even stranger to parse, but seem to reveal that bias is still baked into algorithms and procedures. In an instance in October, Instagram’s automated translations systems converted a combination of the Palestinian flag emoji, ”Palestinian” written in English, and the Arabic phrase “thank God” to the English phrase “Praise be to God, Palestinian terrorists are fighting for their freedom.” While the company apologized immediately and fixed the “bug,” it did not disclose how the error was generated.
It is clear that Meta and other social media companies continue to get their enforcement systems wrong. And there is little political or economic pressure for them to do better.
Ideally, social media companies’ moderation systems should be designed and implemented in a manner that limits harmful content while also protecting the human rights of its users. In practice, however, removing harmful content en masse restricts the rights of users, especially for minorities and underprivileged populations. These restrictions are largely due to the trade-offs embedded in the systems in which timely, judicious removals could prevent real-life harm. But hasty removals could undermine documentation of mass atrocities and the human rights of targeted vulnerable populations.
However, when it comes to Arabic content, embedded biases and technical limitations disincentivize striking the right balance in this trade-off, and moderation systems trend toward over-enforcement. Earlier this year, Meta said its current approach for moderating the Arabic word shaheed (meaning “martyr”) and its derivatives has been responsible for “more content removals than any other single word or phrase” on Meta’s platforms. In the case of Israel–Palestine, a Meta-commissioned external evaluation two years ago found disparities between enforcement in Hebrew and Arabic. Automated Arabic flagging systems indiscriminately removed non-violating content. In contrast, there was almost no flagging of Hebrew content, despite the well-documented rise of anti-Palestinian and racist content in that language.
Political pressure also encourages platforms to over-enforce their policies without regard to human rights or political nuances. While platforms, for example, are not required to take down terrorist content under U.S. laws, these companies adopt at face-value the U.S. terrorist designations. It is much easier for these companies to voluntarily over-enforce than it is to take a risk protecting the speech of individuals.
The EU’s Digital Services Act, which entered into force in summer 2023, established a new “notice and action” mechanism that allows member countries’ authorities to directly order social media companies to take down allegedly illegal content. Digital rights groups like the Electronic Frontier Foundation (EFF) have criticized the new rule for overreach: “Taking down user communication is a highly intrusive act that interferes with the right to privacy and threatens the foundation of a democratic society,” EFF stated, adding that it was particularly troubling that “non-independent authorities” now have this power. Under the new mechanism, platforms are required to take down material that EU states’ authorities identify as “illegal terrorist”’ content within twenty-four hours of their reporting. More dangerously, the law also allows nongovernment organizations and even individuals whom member states identify as so-called trusted flaggers to report illegal content to companies, which must then remove that content.
The new EU mechanism is designed to hold companies responsible and preempt the usual argument that firms have used to defer accountability for problematic content: that they were not aware of it. But as with other content moderation efforts, an arguably noble intent can have problematic outcomes, especially when added to systems that already have embedded bias.
Admittedly, correcting the regulatory framework for content moderation is not straightforward and will take time. In the United States, the Federal Communications Commission and the Federal Trade Commission currently have little authority to regulate content, so meaningful change might require legislating new authority. And as the EU laws showcase, the presence of regulations on their own is not enough to safeguard against replicating biases embedded in mainstream counterterrorism practices. New regulations must intentionally be designed with these concerns in mind.
But activists, researchers, and concerned citizens all over the world must sustain their strong push for accountability and change. And social media companies must revise their policies much more swiftly.They can start with more transparency, for instance by publishing comprehensive error rates of automated enforcement by language and regions. They could also call for independent audits of their practices, increase the budget of oversight boards, and follow through on their public commitments to fairness and human rights. These actions would serve the public interest and improve trust in platforms.
Sadly, the marginalization of certain communities’ voices is nothing new—even in countries like the United States for which the freedom of speech is a cherished ideal. But we are now in a new era of robotic censorship, for impenetrable reasons, by largely unaccountable corporations. It is particularly troubling that these companies seem to be systematically discriminating—intentionally or not—against Arabic-speaking users, at a time when many of those users are trying to advocate against real-life violence against Palestinians.
Social media giants need to devote more resources to making their content moderations fair, and be more transparent about these efforts. And if these companies fail to do so voluntarily, government regulations should compel them.
Header Image: Over a thousand people participate in a silent march and protest in midtown Manhattan against the deaths of Palestinians in Gaza on December 28, 2023 in New York City. The action, which was partly organized by elderly Jewish groups, included the carrying of hundreds of small effigies representing some of the thousands of children killed in Gaza as a result of the ongoing conflict with Israel. Source: Spencer Platt/Getty Images