Merchant Updates (or, how I shot myself in the foot with AI)

So... mistakes were made.

You may recall from last weeks Post that I was working on and planning to finish, merchants so that the player could buy and sell all the phat lootz. Well, things didn't go quite to plan. First of all, I was delayed for some things outside the purview of Delvers Inc and it was unavoidable, I thought for sure I could catch up and get things finished up enough by EOD Monday to put out the blog post. This, it turns out, was the purest of fantasy as I have managed to shoot myself in the foot while simultaneously thinking myself very clever. A horrible combination.

First off, know that merchants exist, and they have inventories of items you can buy from or sell to them. So yay (insert applause). However the real meat of this blog post isn't going to be about that. This is ending up more akin to a recent blog post from Garry Newman whom many of you may know as the gentleman who gave us both Garry's Mod and Rust. That post, entitled quite accurately as In The Shit, can be read at your leisure by following that link.

How We Got Into This Pile Of Shit

You see, I'm Gen-X and Gen-X isn't always up to date on the latest buzzwords going around even when those buzzwords are in their own particular industry.

A few weeks back I heard a "new-to-me" one... 'Vibe Coding'. If you're like me and were wondering 'what on God's green earth is Vibe Coding?', allow me to summarize. Vibe Coding is at its most essential, the use of LLM to describe in normal human language, the functionality of a system or application but let the LLM do all the work of actually BUILDING it.

Up to that point my exposure to AI/LLMs has been what could best be described as an "interesting hobby". I wasn't using the AI to do anything worthwhile and certainly not anything I would try to maintain myself so I figured I would give it a whirl and see what happened.

This came about gradually you see. It started with an article in Code magazine (which my former employer is still, for some reason, paying for me to receive) in which the author used ChatGPT to remake Asteroids. It was an interesting read, as was the follow up article the next issue where the author created a TTRPG encounter builder that used ChatGPT to build the encounter.

That lead me to trying a few 'vibe coding' experiments over a weekend. First I/It made an AAX->M4A converter (for ahem... purely educational purposes I assure you), as well as a Todo app since the Microsoft one annoys me and I refuse to spend money on a todo application. Honestly, a text document with [ ] characters has worked just fine for decades. Why mess with perfection?

It all went swimmingly.

I of course, in true vibe coding fashion, learned nothing about the code it was writing because honestly I didn't care. The stuff it was making was working, I was learning how to talk to the LLM to make it do what I asked, it was a fun weekend activity. Life was fine.

The First Mistake

Then one day a few weeks back I was working on Delvers Inc and ran into a bit of a problem. You see, at the time I had a single class handling everything visual for the delvers in scene. Nav setup, movement, attacking, healing, equipment visuals, etc... if it had to to with how Delvers looked on screen then it was in this single class.

As you can imagine this class started off small but as the game got bigger, so did this controller. More and more was getting added for convenience so that I could move on to the next feature and the class had grown out of control. It was huge, it had multiple jobs, it was desperately in need of a refactor.

When it got to the point that I spent more time scrolling the class looking for the place I needed to make a change then it took to actually make the change, I knew I had a problem. It needed a refactor badly but I was avoiding it because it would be an all-day undertaking and I wanted to do the next cool thing.

So, here I was, high off the recent 'success' of my vibe coding experiments like a junkie looking for his next easy score. I decided to let Claude AI handle the refactor, after all it had worked so well up to this point and I firmly believed that Claude could get it done in a fraction of the time.

So I spent a good half hour or so crafting a detailed prompt. I laid out everything it would need for context, what I wanted it to do, even why I wanted it to do it. Then I sent Claude off to do it's thing while I went and did some hoousehold chores. I had to come back and type 'Continue' a few times because Anthropic has some insanely draconian usage limitations but eventually it spit the whole refactor out. I brought it all into my codebase, had to fix some syntax problems, noticed a few immediate errors where Claude decided that some navmesh code I had written before wasn't necessary for some reason but within about 2 hours, I had the whole thing working again under the newly refactored classes. Success! 8-10 hours of work in only slightly more than 2! Honestly it wasn't that bad, and if I had left it at that, then we wouldn't be having this conversation.

But I didn't, I had to press my luck and that leads us to the second mistake.

The Second Mistake

At the time that all this went down, I had been working on the Inventory panel. I was mostly done with it, but this panel, much like the delver controller class, had grown too big. It had to handle too much and it was getting difficult to get around the code.

Having just gotten my recent dopamine hit from the delver control refactor, I stupidly decided to let Claude also refactor the inventory panel code. I crafted another detailed prompt, context was a problem because the Inventory panel was a bit spider like. It had to touch several systems INCLUDING all the new delver control systems.

I added as much context for Claude that I reasonably could and let it do its thing. This time there were more errors and it took more effort to massage the refactored code into the codebase. In terms of time saved, I figured the refactor probably would have taken half a day or so and it was done in about 2ish hours. So once again, I had "saved" another 2 hours of time.

Claude had decided to use a command pattern for a whopping total of 3 commands when it really didn't need to, but I left it because I wanted to be done and move on to the next cool thing. Which is why I also didn't inspect the refactor too closely. This was mistake number three.

The Third Mistake

EVERY SINGLE LLM will tell you, right at the bottom of the screen next to the input box, that the AI can make mistakes and to check its work carefully.

I did not do this, I had gotten used to it spitting out mostly decent responses, and honestly...when I see those warnings I think more about LLM hallucination problems. Like the AI just making up information that doesn't exist. However, the mistakes that Claude made this time around weren't hallucinations it was because Claude, despite it being called an Artificial Intelligence, has nothing what-so-ever intelligent about it. It didn't understand my codebase. It couldn't make decisions based upon that understanding because it doesn't have one. It isn't intelligent, "AI" is just a marketing term for the media, these LLMs are, when you really get down to it, highly sophisticated guessing machines.

You could make a really fancy and sophisticated machine that throws darts at a dart board with high accuracy...that doesn't make it intelligent. I knew this, but during my experiments with 'vibe coding' I had deceived myself into believing it was "smart". Similar to the para-social connection some people get with their favorite YouTuber who not only doesn't think of the fan as a friend but honestly doesn't know they exist. I had assigned intelligence to the LLM where there was none.

This was the third mistake but it hadn't hit me quite yet because on the surface everything was fine. Everything appeared to be working, and I had moved on to the next "cool thing".

Oh...Now You've Stepped In It

Fast forward some time. I was working rapidly, new features were getting added, systems were getting improved, every once in a while I would give a side-eye over to the refactored code knowing that I hadn't really inspected it carefully but everything was working so it couldn't be all that bad right?

I got merchants mostly in place and was doing some testing when it finally hit, that shoe finally dropped. "Uh...why is that piece of armor I just bought from the merchant not showing up on my delver when I equip it?"

As I step through the code it starts hitting me in waves.

Inventory is calling Equip in the wrong place.
The fact that there are multiple places to call Equip is disheartening all on its own.
Visuals are not getting updated.
Effects are getting added to the delver not once but THREE times.
The refactor had replaced 2 lines of code with 30 and an entire command pattern structure.
Code was partially missing in places (seriously, Claude had kept the code to handle updating weapon visuals but tossed the code that handled updating armor)
Duplicated code was all over the place.
Code doing the same thing but slightly differently was likewise all over the place.

It was/is a hot mess.

I couldnt just roll back, at this point I had weeks worth of changes on top of that refactor, also those spots in code had still needed A refactor at the time, that's how we got here, so rolling back wasn't going to work. I have to refactor the refactor. Clean everything up and put it back into a working order that makes sense.

So What Are The Lessons Learned

Honestly, this is not a fault with the LLM. This is a ME problem.

I gave the LLM too much credit, I got too relaxed using it and forgot that it isn't intelligent. It doesn't think despite what the vendors want you to believe when they throw marketing terms like thinking mode or reasoning mode at you. That's just marketing mumbo jumbo for "spending more time iterating over it's fancy guessing algorithm".

I knew these things but I let myself be persuaded otherwise. This isn't to say LLMs don't have their uses. They do, they can assist and speed things up just like Intellisense did so many years ago, but I balk at the 10x-ing claims some people will make, that is short term thinking.

That 6-8 hours I had 'saved' myself by letting Claude handle the refactor? Well that is probably what it is going to take me to clean up this mess. Ultimately I saved myself zero time, likely cost myself additional time when you add in debugging the problems.

You cannot replace software engineering with a guessing machine. Some executives who only think in the short term might want to believe that, but right now they are like me a couple weeks ago...they are in the 'fuck around' stage. Also like me they will eventually hit the 'find out' stage.

Now, if you will excuse me I have a giant mess to cleanup, so I'll talk to you all next week and as always... in the meantime, Like us on X and Facebook or sign up for our newsletter so that you don't miss out on a post!

-Arakiel