The Hidden Truth About AI Sneaking Into Crowdsourced Workforces
- Jun 19, 2023
- 490

Recent research from the Swiss university EPFL reveals alarming statistics about the infiltration of artificial intelligence in Amazon's Mechanical Turk service, a critical resource for developers requiring human assistance. The increasing presence of AI within these services raises important questions about quality control and the future of crowd work ecosystems. This article will discuss the significance of these findings, the methodology used by researchers to detect AI-generated content, and the implications for data scientists and product managers relying on human-generated datasets.
Amazon's Mechanical Turk provides a platform for users to delegate tasks to human workers, which are typically too ambiguous or complex for computers to handle effectively. Developers trust these human-generated outcomes to be more accurate and nuanced than AI-generated results. However, the EPFL study suggests that between 33% and 46% of crowd workers on the platform cheated by using tools like ChatGPT to complete tasks. As AI becomes more accessible and affordable, this trend has the potential to poison the well of human-generated data that many developers depend on.
To identify the infiltration of generative AI technologies such as ChatGPT in crowdsourced workforces, the researchers developed a method for distinguishing between human-generated text and machine-generated text. The study's participants were tasked with condensing research abstracts from the New England Journal of Medicine into 100-word summaries—a task that AI models like ChatGPT are well-suited to perform. The results demonstrated the challenges faced by both humans and machines in differentiating between AI and human-generated content.
The growing occurrence of AI-generated content within crowdsourced platforms poses significant risks for data scientists and product managers who rely on accurate and reliable human-generated data. As the lines between human and machine-generated content blur, it becomes increasingly challenging to ensure the quality and authenticity of datasets used for training machine learning models. This development threatens to undermine the very advantages that human-generated data is believed to possess over machine-generated alternatives, such as greater accuracy, intuition, and understanding of nuance.
In conclusion, the infiltration of AI into the crowdsourced workforce is an issue that warrants serious attention from developers, product managers, and data scientists alike. The increasing availability of AI tools like ChatGPT poses a significant risk to the quality and trustworthiness of human-generated data, potentially negating its perceived advantages over machine-generated content. As the technology landscape continues to evolve, it is crucial for stakeholders in the AI and crowdsourcing industries to address this issue and develop methods to ensure the quality and authenticity of the data upon which they rely.