Code review in an ideal world

With of the remarkable benefits of agentic tools for software development comes a frustrating paradox: they also make good software development more difficult.

There is a bottleneck between the production of code and integrating it safely into a code base. OpenAI Codex. Claude Code and other tools can generate code much faster than it can be reviewed.

a pot of soup cooking on a really hot stove and glowing red with danger

And it needs reviewed. The agentic tools can be unpredictable and are not, completely, trustworthy. Any substantial project can be overwhelmed with pull requests. The software team struggles to maintain quality because of the amount of diligent work needed to validate updates.

One negative consequence of speedy software is the possibility of bad code. The agentic coding tools generally make a good product, so the process has a weakness that the team may mistakenly take what they are given as correct. An additional gotcha is that they can get code that is correct but accompanied with comments that are wrong. This sets the stage for the everyone to believe the non-code products, letting both the developers and future coding agents be unaware of the time bomb hiding there.

An alternate world could solve this problem and allow the bounty of code development tools attain their full potential:

In our world, the updates now may hide subtle code change mistakes. In this alternate reality, the coding tools would be deterministic. Once the programmer develops an update, the pull request would consist of a series of commands to software agents. The commands could be evaluated at face value because they would precisely describe what the update will do. An update request would be a concrete proposal. Also those commands could be evaluated with AI tools specialized to identify their loopholes. That could protect against the update’s bugs being undetected.

Since AI development tools usually produce good results, their subtle errors might be accepted because, by reputation, they will usually be correct. That is a critical flaw. If the request could be a description of the change rather than the change itself, the agentic tools could be even more powerful than they already are.

AI: Closer to Home

Early 2024, I purchased a new computer with advanced specs. My fairly limited goal was to take my journal’s text files and feed them into an LLM system.

I also had some dreams of building AI engines of my own based on that body of text. I would have no worries about my ownership rights to the text. It also would make any analysis and training more personalized. Doing it locally would also help with privacy concerns.

One resource that I found online are videos by the researcher Andrej Karpathy. The YouTube playlist he has compiled is Neural Networks: Zero to Hero. (I haven’t watched them all.)

My new computer has some advanced specs such as an upper-middle range Nvidia video card with 16GB GPU memory. I built up the MSI motherboard to the max of its RAM at 192GB. The GPU is a 12 core/24 thread AMD CPU. I’m glad I bought the system when I did because that much RAM is prohibitively expensive now.

I’ve done some experimentation using Ollama and found that I can run many of the Open Source LLMs including some with up to 80 billion parameters with a reasonable performance level. Not in the same performance ballpark as a commercial service, but still workable. Many of the models I have were released by AliBaba in their Qwen series.

By running the models locally, I don’t need to worry about the expense of using commercial servers to do my experiments. Instead, the biggest cost of my experiments is the amount of time it will take to iteratively apply my different configurations. That will require me to become more disciplined in how I proceed. My internet bandwidth becomes moot. In addition, I can combine different engines concurrently to play on each other’s strengths.

I’ve used the (free) version of Gemini so far to give me a lot of coaching and writing some simple processing scripts. I also use ChatGPT as a Python tutor. I need to experiment more with OpenAI’s Codex.