Is there a place for investigative journalists in the world of AI?

By Oleksandra Yaroshenko

Journalism has always been a guardian of the public interest, holding the government accountable. This is not only a professional duty, but also the foundation of its democratic mission. This is especially true for investigative journalism, which has always been at the forefront of the fight for transparency and justice. But today, power is no longer limited to governments. Giants like Meta, Google, or Apple don’t just control the digital space – they shape its infrastructure. That is why journalism is increasingly focused on critical analysis of digital platforms. Big tech companies have become new centers of influence, comparable to states with their own rules, governance systems, and social responsibilities. Just as journalists once monitored the actions of governments, today they have the tools and experience to monitor the power of digital platforms. And this is perhaps the most important challenge of modern journalism.

Traditional methods of journalism are often unable to cope with the technological complexities and opacity that big tech companies hide. These companies are the real “black boxes” of the modern world: from government and corporate secrets to deep algorithms that require special technical knowledge to understand. Unlike people, algorithms cannot be invited for an interview. And here a serious problem arises: journalists are dealing with a huge asymmetry of power and information. Large platforms, with their endless financial, technological, and legal resources, almost completely control the information space, making it impossible to penetrate the veil of corporate secrets.

That’s why the critical analysis of the algorithms of large tech companies falls on the shoulders of specialized investigative organizations such as ProPublica or The Markup, as well as Lighthouse Reports. In addition, non-profit organizations such as AlgorithmWatch, Systemic Justice, and The Center for Countering Digital Hate are continuing to explore the impact of these powerful technologies on society. Researchers are increasingly discussing how AI can change investigative journalism by helping to uncover hidden stories and work with huge amounts of data. This approach has already been called “digital watchdogs” and is seen as a tool to control the government.

Chatbot training data and TikTok’s recommendation algorithm

The research of Joris Weerbeek from the Department of Media and Cultural Studies at Utrecht University (the Netherlands) involves the direct application of AI in journalistic investigations. As part of his PhD project, he analyzed two cases in cooperation with the Dutch weekly De Groene Amsterdammer. It is a small newsroom with 15 journalists that in 2020 launched the Data and Debate initiative to analyze online debates in partnership with Dutch universities.

The research methodology involved close cooperation between journalists and academics. Journalists were responsible for finding sources, conducting interviews, and analyzing documents, while researchers developed data analysis methods and worked with algorithms. The workflow included weekly formal meetings, joint data labeling, and informal discussions. The final materials were written by journalists, but with input from researchers. This interaction allowed not only to test AI tools in journalism, but also to evaluate their effectiveness in real editorial conditions.

One of the key research questions was: what sources allow large language models to achieve a high level of proficiency in the Dutch language? The researchers studied data from Google’s Colossal Clean Crawl Corpus (C4), in particular its multilingual version mC4, which contains more than 95 million Dutch websites. By analyzing the filtered data used to train GPT-3, the team found that the content included in the model was heavily influenced by selection algorithms that favored English-language texts. A number of problematic content types were also identified, including personal data, copyrighted material, and disinformation sites, which epitomizes the risks of insufficiently controlled training data collection for AI.

This study also examined how TikTok detects user interest in eating disorder-related content and how quickly it starts recommending similar videos. Using automated accounts on physical smartphones, the researchers tracked how long it took the algorithm to fill the feed with videos on this sensitive topic. It turned out that TikTok could adapt recommendations in just a few minutes, which demonstrates its ability to quickly identify hidden interests of users. To analyze the video content, the team used the CLIP model, which allowed them to evaluate the relationship between images and text in the video.

Speed, personalization, reproducibility

Based on the two journalistic cases, there are three main categories of AI’s impact on the availability of operations with large digital platforms: speed and scale, personalization, and reproducibility. Speed and scale allow journalists to work efficiently with huge amounts of data, automatically categorizing millions of records in a short time. This is especially important in investigations involving training datasets for chatbots, where AI helps to organize content, making it available for further analysis. Although such automation does not replace human labor, it allows for tasks that would have been impossible to perform before due to their scale.

Personalization, in turn, allows AI to mimic user behavior on the platform, which helps to better understand how recommendation algorithms work. This is particularly evident in the TikTok study, where AI bots tracked interactions with content related to eating disorders. Unlike traditional data collection through hashtags, the use of AI made it possible to identify unlabeled videos with relevant visual content. This approach not only makes algorithmic solutions more understandable to a wider audience, but also allows for better coverage of technical processes in journalistic investigations.

Reproducibility is an important aspect of journalistic methodology when it comes to researching digital platforms. Automating the process of data collection and analysis allows journalists to repeat experiments using the same parameters, which increases the reliability of conclusions. For example, the analysis of TikTok’ s algorithms can be repeated at different times to track changes in the recommendation system. This is especially important as platforms can deny individual cases or explain them as coincidences.

Although large tech companies are increasingly closing their algorithms and tightening control over the information space, journalists can use AI to analyze huge amounts of data, identify algorithmic biases and investigate the impact of digital technologies on society. Therefore, in the new technological landscape, investigative journalists not only have their place, but also play a critical role in ensuring the transparency of digital giants.

This article, Is there a place for investigative journalists in the world of AI?, was first published by European Journalism Observatory on March 26 2025