27.05.2026

Meltem Aksoy presented an ICST paper co-authored with Igli Begolli and Daniel Neider.

Photo: Meltem Aksoy

Before software reaches users, it usually passes through many pairs of eyes. Developers check each other’s code, ask questions, suggest improvements, and look for hidden problems. This process – known as code review – is one of the quiet but essential routines behind reliable software.
But code review is also demanding. It takes time, experience, and a good understanding of both the programming language and the project context. With large language models now entering software development workflows, one question is becoming increasingly relevant: Can AI help review code more efficiently – without replacing human judgment?
This question was at the center of a paper presented by Meltem Aksoy, Postdoc at the Research Center Trustworthy Data Science and Security (RC Trust), at the IEEE International Conference on Software Testing, Verification and Validation (ICST 2026) in Daejeon, South Korea.
The paper, titled When Less Is More: Monolingual Fine-Tuning of Language Models for Industrial C# Code Review, was co-authored by Igli Begolli from Lovion GmbH / TU Dortmund University and Prof. Daniel Neider, who holds the Chair of Verification and Formal Guarantees of Machine Learning at the Department of Computer Science at TU Dortmund University.

Testing language models in real industrial software

The study focuses on a very practical setting: industrial code written in C#. While many studies on AI-assisted programming focus on widely used benchmark languages such as Python or Java, C# plays an important role in many real-world software environments – especially in industry.
Together with the industrial partner Lovion GmbH, the researchers investigated how open-source language models perform when they are fine-tuned specifically on C# code from real industrial repositories. In simple terms, fine-tuning means adapting a model more closely to a particular language, domain, or type of task.
The team evaluated the models across several code-review tasks, including estimating the quality of code changes, generating review comments, and suggesting refinements. To assess the results, they combined automated metrics with expert-based human evaluation.

When less can be more

The title of the paper already points to one of its key insights: more general training data is not always better. In some cases, models that are more narrowly adapted to one programming language and one industrial context can provide stronger support than more broadly trained alternatives.
The results show that monolingual fine-tuning can improve review quality and efficiency. For routine or recurring tasks, language models can help identify patterns, suggest comments, and support developers in navigating large amounts of code.
At the same time, the study also makes clear that these models still have limits. Code review is not only about syntax or surface-level patterns. It often requires context, judgment, and a deeper understanding of why a change matters. This is where human reviewers remain essential.

Assistive tools, not replacements

The central takeaway is therefore deliberately balanced: fine-tuned language models are promising tools for industrial code review, but they are not substitutes for human expertise.
Instead, the research points toward hybrid workflows. In such workflows, AI systems could take over more routine or repetitive review tasks, while human experts focus on complex, critical, and context-sensitive decisions.
This perspective fits closely with the mission of RC Trust. The question is not simply whether AI can automate a task, but how it can be integrated into real-world processes in a way that is useful, reliable, and responsible.

Reliable AI for real-world development

For Meltem Aksoy, presenting the work at ICST 2026 placed the study in exactly the right scientific context. The conference brings together researchers and practitioners working on software testing, verification, and validation – fields that are central to building reliable software systems.
The paper also connects directly to broader questions in trustworthy AI: How should language models be evaluated before they are used in practical settings? Where do they provide real value? And where do their limitations require careful human oversight?
By studying these questions in cooperation with an industrial partner and with real C# repositories, the work provides a grounded perspective on AI-assisted software development. It shows that trustworthy AI is not only about building more powerful models. It is also about understanding where they help, where they fall short, and how humans and machines can work together more effectively.

Category

  • Publication
  • Network

Author

Patrick Wilking

Scroll To Top