Code review in an ideal world

With of the remarkable benefits of agentic tools for software development comes a frustrating paradox: they also make good software development more difficult.

There is a bottleneck between the production of code and integrating it safely into a code base. OpenAI Codex. Claude Code and other tools can generate code much faster than it can be reviewed.

a pot of soup cooking on a really hot stove and glowing red with danger

And it needs reviewed. The agentic tools can be unpredictable and are not, completely, trustworthy. Any substantial project can be overwhelmed with pull requests. The software team struggles to maintain quality because of the amount of diligent work needed to validate updates.

One negative consequence of speedy software is the possibility of bad code. The agentic coding tools generally make a good product, so the process has a weakness that the team may mistakenly take what they are given as correct. An additional gotcha is that they can get code that is correct but accompanied with comments that are wrong. This sets the stage for the everyone to believe the non-code products, letting both the developers and future coding agents be unaware of the time bomb hiding there.

An alternate world could solve this problem and allow the bounty of code development tools attain their full potential:

In our world, the updates now may hide subtle code change mistakes. In this alternate reality, the coding tools would be deterministic. Once the programmer develops an update, the pull request would consist of a series of commands to software agents. The commands could be evaluated at face value because they would precisely describe what the update will do. An update request would be a concrete proposal. Also those commands could be evaluated with AI tools specialized to identify their loopholes. That could protect against the update’s bugs being undetected.

Since AI development tools usually produce good results, their subtle errors might be accepted because, by reputation, they will usually be correct. That is a critical flaw. If the request could be a description of the change rather than the change itself, the agentic tools could be even more powerful than they already are.

AI: Closer to Home

Early 2024, I purchased a new computer with advanced specs. My fairly limited goal was to take my journal’s text files and feed them into an LLM system.

I also had some dreams of building AI engines of my own based on that body of text. I would have no worries about my ownership rights to the text. It also would make any analysis and training more personalized. Doing it locally would also help with privacy concerns.

One resource that I found online are videos by the researcher Andrej Karpathy. The YouTube playlist he has compiled is Neural Networks: Zero to Hero. (I haven’t watched them all.)

My new computer has some advanced specs such as an upper-middle range Nvidia video card with 16GB GPU memory. I built up the MSI motherboard to the max of its RAM at 192GB. The GPU is a 12 core/24 thread AMD CPU. I’m glad I bought the system when I did because that much RAM is prohibitively expensive now.

I’ve done some experimentation using Ollama and found that I can run many of the Open Source LLMs including some with up to 80 billion parameters with a reasonable performance level. Not in the same performance ballpark as a commercial service, but still workable. Many of the models I have were released by AliBaba in their Qwen series.

By running the models locally, I don’t need to worry about the expense of using commercial servers to do my experiments. Instead, the biggest cost of my experiments is the amount of time it will take to iteratively apply my different configurations. That will require me to become more disciplined in how I proceed. My internet bandwidth becomes moot. In addition, I can combine different engines concurrently to play on each other’s strengths.

I’ve used the (free) version of Gemini so far to give me a lot of coaching and writing some simple processing scripts. I also use ChatGPT as a Python tutor. I need to experiment more with OpenAI’s Codex.

GPT + Microsoft, Bard/Google and Beyond

The public conversation about AI tools is stuck focusing on ChatGPT, ignoring older uses of AI and other tools. Although it’s easy to use and flashy, but there is more to AI than a chat engine.

Microsoft has a couple of ways of interacting with a GPT derived chat tool. In the title bar of their Edge browser, there’s a prominent ‘b’ logo that opens a side panel to start a conversation. One enhancement to ChatGPT is that the Edge browser will give links referring to information sources. A conversation mode is also directly integrated with bing.com search results.

Google has an experimental chat service, Bard, at https://bard.google.com. My limited experience with Bard has been unsatisfactory so I have only used it a few times.

I have a specific use case with the interactive engines: helping me with programming language syntax and techniques. I’m not using it for the high-powered manner app-generating miracles that I’ve seen described in the media. Instead, I’m using it to supplement conventional help resources. As I learn more, I can ask more complex questions and develop increasingly useful skills.

Bard answers my questions in a very stilted manner. When I ask a programming question, the code it generates can be stand-alone. The code includes fluff such as verbiage allowing me to copy and paste the suggested code and directly execute it. That is frustrating because my goal is to learn how a feature works, not generate sample code. Bard is evolving, but what I’ve seen so far isn’t compelling me to use it. It prefers to give a specific, narrowly focused answer rather than explain a concept. I doesn’t know the context of my questions and give an example with minimal insight.

Bing’s search tool is much more useful to me. It remembers the context of my current questions. I don’t have to tell it “Python” or “JavaScript” every time I’m asking a new question. It presents example code that is relatively terse and succinct, helping me not get bogged down by unrelated details. I don’t expect the code to be stand-alone because I’m not looking for it to write code for me.

However, sometimes, a web search is more effective than using chat features. There are a few specialized sites such as https://stackoverflow.com that can answer questions. A search on Google, DuckDuckGo or Bing can go off base and include unneeded results, especially when the correct technical term has other generic uses. In one pleasing interaction, Bing’s top-line short answer was unrelated, but when I opened the chat, the chat answered the real software issue I was trying to understand.

It seems that ChatGPT is “sucking all of the Oxygen out of the room.” All of the news or blog commentaries talk about its threats and promise. They forget that AI has more uses beyond general-purpose conversational tools.

I have been using less prominent and limited AI for quite a while. Edge and Microsoft 365 (Office) both have been giving me suggestions as auto-complete so that I can accept with a tab press. It is not flashy, but it can save time. The keyboard interface to my iPhone’s messaging app also tries to predict what I intended. They are using Artificial Intelligence algorithms for that service. It is helpful.

My realization is that that AI is not a new tool. Amazon and others use it to identify potential sales based on its analysis of customer search and sales history. This Big Data application of artificial intelligence is old enough that it’s invisible now. It’s just called “the algorithm” and it is so ubiquitous that it’s often mundane.

If you don’t use the Edge browser and Bing search engine, you’re not going to see these additional ways of using the GPT Engine. I find them very productive. AI has a public face in ChatGPT, but there are other ways AI technology is common.