Like many of you, our engineering teams are navigating a challenging squeeze: tighter budgets, 10×-level performance mandates, and increasing expectations of AI-enhanced productivity. We’re exploring every possibility; augmenting workflows with GitHub Copilot, test generation tools, and using (limited) automated code reviews in efforts to make good on the acceleration hype.
AI isn’t magic
We are seeing real momentum in some areas.
For example:
- Generating boilerplate code in small repositories
- Improving Test Coverage
- Accelerated Refactoring
- Generating Documentation
But the truth is that AI isn’t a magic wand. Its power often depends on the environment: it thrives in small, consistent, well-documented codebases, but struggles—and even hallucinates—in tangled, legacy code monoliths.
Why size and consistency matter
Large, multi-generational codebases are often saddled with:
- Conflicting styles and architecture, which confuse AI and can lead to incorrect suggestions or hallucinated code.
- Context gaps, where AI hasn’t been trained or primed on your unique patterns and APIs.
Academic research illustrates this concretely. Without consistent structure and context, LLMs tend to generate API calls or code that don’t actually exist in the project. There are techniques that can reduce such errors and significantly improve accuracy, but they must be applied thoughtfully and deliberately. (cline, arXiv).
Successful AI requires robust engineering hygiene
Here’s what’s working at the intersection of AI and software quality, much of which focuses on the quality of documentation and context engineering:
- Create clear repository documentation: AI tools rely on precise documentation for context engineering and attention. Write thorough readmes and onboarding docs (that a human would love). Describe the system purpose (goals), inputs/outputs (system boundaries), expected coding style, lint rules, and testing processes. (benhouston3d: example doc for AI)
- Document AI-generated code and enforce CI quality gates: Best practices include tagging AI-generated commits, storing prompt code pairs (for review along with code), and incorporating static analysis in CI (continuous integration) pipelines (cogentuniversity.com). Many teams find that without these guardrails, AI can accelerate tech debt: hallucinations, bug rates, and even duplicated code can all proliferate, especially when AI output is trusted more than it’s tested (askflux.ai).
- Provide clear user stories and acceptance criteria: Well-defined user stories guide AI to better outputs—just like they do for human product and engineering teams. Those stories must include explicit acceptance criteria so that the AI knows its outputs meet the story’s needs.
- Prioritize context engineering over prompt magic: Good user outcomes aren’t just about clever prompts; they are about providing the right information, in the right structure, at the right time (Hacker News).
Bottom line: the mid-2025 AI wave is a chance to reset for the better
Let’s use this era of AI excitement to:
- Set realistic expectations: Generative AI can boost productivity, especially for focused tasks—but it is not a universal accelerator.
- Raise the quality bar: We can use this movement to improve quality for both AI and humans. This includes improving READMEs, onboarding docs, coding standards, and process clarity that benefits every contributor.
- Design for collaboration with AI—but value human workflows and well-being too: The fastest, most reliable teams are those where technology delivery and developer experience reinforce each other (askflux.ai, en.wikipedia.org).
Our best results come from tools that deliver promises while also making the human work week richer, not shallower.
Let’s chase AI productivity—but let’s do it in a way that strengthens both our codebases and the humans who build them.
TL;DR summary and more links
Insight | Source / reasoning |
AI can be 2× faster—but only for cleaner code | McKinsey study: McKinsey & Company |
Copilot increases task speed 55% | Controlled experimentation: arXiv |
Huge codebases increase hallucinations | Project-specific grounding: arXiv |
Context engineering is more than prompt tricks | Emerging best practices: Hacker News, qodo.ai |
Quality fails when AI is rushed in | Engineering leaders report more bugs and debt: askflux.ai |
Context clarity helps both AI and humans | Context engineering & docs best practices: meirg.co.il, blog.codacy.com |
Developer experience matters! | DX research & tools trends: en.wikipedia.org |