Large language models (LLMs) demonstrate the ability to recognize the identities of social media users, even when they are using anonymity.
This is reported by Business • Media
- LLMs allow for the de-anonymization of social media users with a high level of accuracy.
- In experiments, the accuracy of identification reached 90%, while recall was 68%.
- Researchers emphasize the risks to privacy and security in online communications.
AI Experiments: How De-anonymization Works
A group of scientists from the Swiss Federal Institute of Technology Zurich (ETH Zurich) and the company Anthropic found that modern LLMs can identify the identities of users hiding behind pseudonyms on social media. The research results indicate that such approaches work on large datasets and enable the discovery of connections between accounts on different platforms.
Analysts believe this calls into question the role of pseudonymity as a fundamental mechanism for protecting privacy online. The scientific paper highlights that LLMs are capable of matching accounts and user messages by analyzing free text, as well as detecting indirect signs characteristic of communication styles.
In the experiments, researchers achieved a “recall” rate—meaning the proportion of successfully de-anonymized users—of 68%. The accuracy of identification reached 90%.
“The published scientific paper states that AI can match accounts and user messages across different platforms. The models analyze free text and indirect signs.”
For testing, several public datasets were used. One of the experiments involved matching user profiles from Hacker News and LinkedIn through cross-platform links. Before analysis, all direct identifiers were removed from the messages, and then LLMs identified individuals based on writing style and other parameters.
Another method involved analyzing data similar to the Netflix Prize dataset: user preferences and activity history. Even without explicit names, this information allowed for accurate identification of individuals.
In some tests, scientists worked with user activity on Reddit. For example, analyzing discussions about movies in various themed communities allowed for the identification of some users with very high accuracy. If a user discussed more than ten movies, the probability of correct identification increased to 90% for nearly half of the accounts and to 99% for about 17% of users.
New Privacy Risks and Researchers’ Recommendations
One of the authors of the study, Simon Lehrman, emphasizes that the main difference with modern technologies is the ability of LLMs to gradually form a complete portrait of a person based solely on fragments of free text. In the past, this required complex algorithms and structured databases.
Scientists warn that such technologies could make mass de-anonymization quick and accessible, contributing to threats of doxxing, harassment, and the creation of highly detailed marketing profiles of users.
Researchers recommend that social media platforms limit mass access to user data through APIs and monitor automated information collection. They believe AI developers should implement mechanisms that prevent the use of models for targeted de-anonymization.
Researchers caution that without appropriate restrictions, such tools could become weapons for governments to identify online critics, while companies could use them for highly precise advertising. Malicious actors could also apply these technologies for large-scale fraud schemes.