
AI coding assistants introduce too many security flaws and should be a wake-up call for the industry, security researchers from Veracode warn. More often than not, the code will be functional but insecure.
The researchers tested 100 large language models’ (LLMs) coding skills, and the results are disappointing.
“Across all models and all tasks, only 55% of generation tasks result in secure code. In other words, in 45% of the tasks, the model introduces a known security flaw into the code,” the report concludes.
What’s significant is that newer, larger, and otherwise better models do not appear to generate significantly more secure code. All LLMs fared poorly on security, and this hasn’t changed over the past two years.
“The state of AI-generated code security in 2025 is worse than you think. What we found should be a wake-up call for developers, security leaders, and anyone relying on AI to move faster.”
How were the tests performed?
To test the security of AI-generated code, the researchers gave the same 80 coding tasks to 100 large language models (LLMs), varying widely in model sizes, vendors, or target applications.
The coding tasks were across four programming languages: Java, JavaScript, C#, and Python. They targeted four common vulnerabilities: SQL injection, cross-site scripting, log injection, and the use of insecure cryptographic algorithms.
LLMs demonstrated the worst performance in the Java programming language, achieving only a 28.5% security pass rate. The least insecure code was in Python (61.69% pass rate), followed by JavaScript (57%) and C# (55%). Researchers believe that these results reflect the nature of the training data.
“Java has a long history as a server-side implementation language, and it predates the recognition of SQL injection as a vulnerability. Our hypothesis, therefore, is that the Java training data contains many more examples that have security vulnerabilities than the other languages,” the report reads.
The data suggests that LLMs are critically incapable of avoiding cross-site scripting and log injection vulnerabilities, only achieving a 12-13% pass rate. They fared much better when dodging the use of broken or risky cryptographic algorithms or SQL injection flaws (80-85% pass rate).
The researchers noted that AI coders rarely sanitize any of the data without a broader context.
All large LLMs with over 100 billion parameters achieved an average 50.87% pass rate, which was almost the same for the smallest, less than 20 billion parameter, models (50.65%).
“No matter how we slice the data, security performance has hardly improved in the last two years,” Veracode researchers conclude.
“While the models got better at writing functional or syntactically correct code, they were no better at writing secure code.”
It’s likely that the security of the code doesn’t improve much due to the code samples scraped from the internet, which contain many vulnerabilities – some even intentionally included – as in the WebGoat project used for teaching.
Companies using AI are at higher risk of breaches
The researchers warn that many organizations are already likely using AI-generated code, which can be introduced by open-source software maintainers, third-party vendors, and low or no-code platforms alongside their own team.
This increases the risk of costly data breaches, which lead to serious reputation, legal, and financial harm.
Steve Krouse, CEO and co-founder of Val Town, compares vibe code to legacy code in a new blog post: it’s unfamiliar, introduces new bugs with new features, and, therefore, takes a lot of time to debug.
“When you vibe code, you are incurring tech debt as fast as the LLM can spit it out,” Krouse said.
While this is perfect for prototypes and throwaway projects, Krouse warns that non-programmers who “vibe coding their billion-dollar app idea today” are unlikely to get where they want to go.
Your email address will not be published. Required fields are markedmarked