Code of Silence – The Hidden Risks of AI-Generated Code

Generative AI has a wide range of possible applications across industries, but it is being more widely applied in some areas than others. A late 2024 Economist Impact Report global survey of 1,100 executives and technologists showed that IT, marketing and customer service are the three most popular areas for its deployment. The survey notes that only 37% of executives believe that generative AI applications are ready for production usage, a figure that drops to 29% amongst actual AI practitioners. It is no surprise that IT is the leading area of adoption: software developers are expensive. The average salary of a US software developer is $115,000, and is higher in Silicon Valley. It is much higher for top firms like Google and Apple: Apple’s average developer salary is $175,000, while OpenAI’s average overall salary is $1.56 million. It is hardly surprising that businesses are exploring ways to improve productivity and their return on investment in those expensive engineers. Sensationalist articles suggesting, or at least considering the possibility, that programming is finished as a career abound, but is this really true? How can we measure the actual effect of generative AI on software development?

Each year since 2013 the DevOps Research and Assessment (Dora) report is produced, tracking software engineering developments and performance based on a very large survey of over 39,000 software developers. Originally an independent research firm, it was acquired by Google in December 2018 but continues its work. Its latest report focuses on the adoption of AI for coding, and has some fascinating insights. As might be expected, the use of AI for coding is high, with 81% of respondents reporting an increase in the use of AI in their applications. In particular, AI is used for writing code (75% of respondents), summarizing information, explaining code, optimising (61%) and documenting code, writing tests, debugging and data analysis (45%).

The survey respondents were asked for their own perceptions of the effect of AI on productivity, with three-quarters perceiving at least some gain in productivity. However, overall delivery of software was actually worsened, with an estimated 1.5% reduction for every 25% increase in AI adoption. Worryingly, survey respondents reported an estimated 7.2% reduction in code stability for every 25% increase in AI adoption. Moreover, 39.2% of the respondents reported having little or no trust in AI. So paradoxically, enterprises are aggressively adopting AI for coding, yet this adoption is resulting in slightly worse delivery and significantly less stable code. These findings, incidentally, confirm similar findings from the previous Dora survey a year earlier.

This survey finding is not an outlier, and anyhow, is the largest survey in the industry. A Stanford University study found that developers who had access to an AI assistant wrote significantly less secure code than those without access. Intriguingly, participants with access to an AI assistant were more likely to believe they wrote secure code than those without access to the AI assistant, contrary to reality. Furthermore, participants who trusted the AI less and engaged more with the language and format of their code produced code with fewer security vulnerabilities. A Wuhan University study found similar results, with 29% of AI-generated code snippets having security weaknesses, based on tests for 89 known security weaknesses.

These results are not entirely surprising. Generative AI works by making decisions based on its training data, and large language models (LLMs) have been trained on huge swathes of code of variable quality. Moreover, technology moves quickly, and an LLM trained on a code library a couple of years ago may produce code that uses out-of-date libraries that may have security vulnerabilities. Developers who write their own code understand what it does, or at least what they intended it to do, which is not necessarily true of AI-generated code. “Vibe coding”, where novices use AI to write code for such tasks as building websites, is fraught with issues. Even if we ignore the issue of hallucinations, which occur in AI-generated code just as in any generated content, there are many other issues. Debugging code that you didn’t write yourself is clearly harder than something that you wrote and understood the logic for, so even though you may be able to generate code faster than you can write, the process of testing and debugging may take longer. AI-generated code may accidentally include copyrighted material in its training data, and tends to do better in situations where there is a lot of training data rather than in novel situations or complex projects.

Although coding is one of the most popular use cases for generative AI, early experience suggests that it is no silver bullet. Wild claims that programmers are an endangered species seem overblown based on the survey data referenced in this document. The latest Dora study shows that 39% of developers that are using AI have little or no faith in the code that it generates, software delivery actually declines, and software stability worsens when it is used compared to hand-crafted code. This suggests that enterprises need to think carefully about their processes and how best to take advantage of this latest advance in technology, rather than being overly gung-ho. Generative AI is still quite new and doubtless advances will be made, but at present enterprises need to be aware of the risks and issues with AI-generated code, and not just the touted benefits.

Post a public comment? Cancel comment reply

You must be logged in to post a comment.