case study
SaaS
Studocu: Ensuring content integrity in the age of generative AI
We partnered with Studocu to build a custom detection system that differentiates between student-authored notes and AI-generated content, protecting the authenticity of their global knowledge-sharing platform.
Dale Wesdorp
February 27, 2026
Share

The problem
behind the brief
Studocu’s business model relies on the authentic exchange of study materials between students. However, the rise of large language models created a significant risk: a surge in AI-generated uploads that threatened to dilute the quality and reliability of their library.
To maintain user trust, Studocu needed a way to accurately identify the origin of every document. Existing off-the-shelf tools weren't precise enough for their specific needs. They required a custom, high-performance solution that could scale with their massive volume of daily uploads.

How we built the right thing, and built it right
Strategy
Our focus was on technical de-risking and data quality. We knew that for a model to be effective, it needed to understand the nuances of student-specific writing versus machine-generated patterns. We defined a strategy centered on high-fidelity data generation to "teach" the system what to look for before these challenges became mainstream.
Design & Data
We designed a robust data structure to generate vast amounts of AI content in bulk. This wasn't about building a "cool feature," but about creating a rigorous training environment. By building this comprehensive dataset, we could conduct precise comparisons between human and machine output.
Development
We fine-tuned the 'RoBERTa' language model, specifically optimizing it for detection within an academic context. We handled the full technical implementation, ensuring the model could be integrated into Studocu’s existing pipeline to assess documents in real-time. The resulting architecture was built for performance and accuracy, outperforming industry-standard alternatives.



Miyagami helped us develop a high-performing AI model, seeing the final product work so effectively has given us the perfect head start.
Marnix Broer
CEO

Results that scale
A custom-built detection engine doubling industry-standard accuracy.
Performance
2x flagging rate vs industry standard
Our custom-tuned model outperformed existing industry options, detecting up to 98% of both AIGC and UGC in documents, and around 80% of AIGC and UGC in question answers.
Trust
Verified content authenticity
By identifying the origin of documents, Studocu can now maintain a library of genuine, student-generated resources, protecting the core value of their platform.
Capability
RD & Strategic edge
Built before the public surge of generative AI, the system gave Studocu a RD and strategic edge, allowing them to scale their operations without compromising on content integrity.




