My exploratory AI workflow

I’ve been spending time doing some work with AI – and this blog is just a “check-in” journal on what has been working for me.

This blog will begin with “Idea Honing” (mini PRD) and then move to an iterative development workflow, followed by an example project.

Idea Honing

You need to spend some time honing your idea. There is a great blog on this subject which contains a great prompt to help step through your idea.

Ask me one question at a time so we can develop a thorough, step-by-step spec for this idea. Each question should build on my previous answers, and our end goal is to have a detailed specification I can hand off to a developer. Let’s do this iteratively and dig into every relevant detail. Remember, only one question at a time.

Here’s the idea:

<IDEA>

You should use the best LLM available to you for this, as you want clarity of words & thought. As of this writing (Sept 1st 2025), Gpt5 & Gemini 2.5 Pro work great for this. I also maintain a leaderboard for wordcel evals – using any of the top models should work great.

One tweak I have added to Harper’s blog is the last step, where he suggests to wrap up – make sure to add “Give me the final output as XML” – for whatever reason this works great!

Development Environment

Lately I have been using vanilla VS Code + Amp. It’s pretty simple to install the plugin and get started – but this is a paid tool! I find I spend somewhere around $5/hr when I use it, which seems fine in the context of my other hobbies.

What’s great about having an environment this simple is that it works great on Windows, Mac, & Linux, so I can seamlessly switch between them based on how I work.

Once you have the environment set up (should take you like 5 minutes or less), you can get started with prompt. Create a new folder for your project, open VS Code, and pass the prompt from the previous step into Amp.

Other Helpful Environment Add-ons

There are ton of niche tools you can add to your environment, so I am simply going to call out the ones I use regularly.

D uckDB for data manipulation.
uv for python environment management.
node+npm for javascript environment management.

These three tools (frameworks? libraries?) will get you very far.

Interacting with your LLM while coding

If you have a simple idea and a good prompt, most of the time your LLM can “one-shot” it by getting you a working prototype with a single prompt. Your goal when building this stuff should always be to decompose your problem into individual, runnable steps, like the Agile car analogy of yore. This is particularly important with LLMs as every time you can run the program, you can also test it. This is critical because you want to keep these steps small so you can commit your changes into source control once whatever your current set of tests is, passes. Then you can give the LLM a new prompt to continue forward from a stable base.

Once I get the first runnable prototype, I do my first commit and spin up the repo in Github. Make sure to do this! LLMs are not deterministic and can destroy your project at any time, as has been memed numerous times on twitter (and a few weeks later, LinkedIn).

Testing

There are a couple of ways that seem to work well for LLMs with testing. I will outline two approaches.

Option 1: Just use the software

This is my typical approach. I just use the thing and modify it as I see fit, and fix bugs as I go. Once it works like I want it to, only then do I add tests – basically to make sure that future changes do not break existing functionality. Of course, LLMs are great at adding tests!

Option 2: Test Driven Development (TDD)

For more complex codebases, it seems like TDD is the way to go. I find it to be overkill for the hobby project but you can indeed write all your tests first and then let the LLM “solve” the tests. For complex logic puzzles (i.e. NFL tiebreaker logic), TDD is a much better approach than brute force by using the software and fixing bugs as they come up.

A practical example

I did a small project over labor day weekend to create a chrome extension to copy a screenshot and turn it into a csv. The below part will be very long because it is going contain the full idea honing transcript and prompt.

Initial Idea Honing Transcript

User: Ask me one question at a time so we can develop a thorough, step-by-step spec for this idea. Each question should build on my previous answers, and our end goal is to have a detailed specification I can hand off to a developer. Let’s do this iteratively and dig into every relevant detail. Remember, only one question at a time.

Here’s the idea: a chrome extension called “csv everything” that lets you take a screenshot and turn it into a csv, which is downloaded by your browser. behind the scenes it makes an API call (using your own API key) to open router with the image, then returns the result as csv. If it cannot identify if its a table, it should tell the user “couldn’t translate to csv”. It should also work with charts – basically it will give you the data points that allow you to build a line / bar / scatter chart.

Gemini: Of course. This is a great way to build a detailed spec. Let’s start with the user’s first action.

After the user installs the extension, how do they initiate the screen capture? For example, do they click the extension icon in the toolbar, which then allows them to drag and select an area on the page?

User: initially it takes whatever is in clipboard, verifies if it is an image, and if so, uses it. If its not an image, it tells the user “copy an an image of the data first”

Gemini: Got it. The trigger is clicking the extension icon, which then checks the clipboard.

Now, let’s think about the necessary credentials. The extension needs the user’s OpenRouter API key to make the API call.

Where and when does the user input and save their API key? For instance, is there a settings page, or does a field pop up the very first time they use the extension?

User: in the extension settings

Gemini: Excellent, a standard settings page is a clear approach.

Let’s refine the user flow for someone who is just getting started. A new user installs the extension, copies an image of a table, and clicks the extension icon, but they haven’t entered their API key in the settings yet

What should the extension do at this point? For example, should it show a message like “API key not found, please add it in settings” with a button that takes them there?

User: it show a message like “API key not found, please add it in settings” with a button that takes them there

Gemini: Perfect, that’s a very user-friendly flow.

Now, let’s move to the core action. The extension has the image from the clipboard and the user’s API key. It’s ready to make the call to the OpenRouter API.

What specific instructions (i.e., the prompt) should the extension send along with the image? For example, should it be a simple instruction like, “Extract the data from this image into CSV format, including headers if present”?

User: we can hone the prompt later as part of the testing. but yes, for now lets use a place holder “turn the image into a csv”. maybe the user can tune the prompt in their settings?

Gemini: Excellent idea. Making the prompt customizable in the settings is a powerful feature for advanced users. We’ll add that to the spec for the settings page.

So, the API call has been sent with the image and the user-defined prompt. This process might take a few seconds.

What should the user see in the extension’s interface while it is actively processing the image and waiting for a response from the API? For example, should it display a loading animation or a message like “Generating your CSV…”?

User: it should have a little spinner like the download icon in chrome – a dotted line spins around the outside of the icon.

Gemini: Got it, a spinner animation that mimics the native Chrome download icon will provide clear visual feedback.

Now, let’s handle the two possible outcomes of the API call. We’ll start with the successful one

The spinner stops, and the extension has received valid CSV data from the API. What should happen next? Should it automatically trigger the browser’s download functionality for the `.csv` file? And what should the default filename be?

User: “data.csv” is the default name, it should trigger the “save file” pop over

Gemini: Understood. The extension will trigger the standard “Save As” browser dialog, pre-filling the filename as `data.csv`.

Now, let’s cover the other scenario. The API call finishes, but the model could not interpret the image as a table or chart.

What exactly should the extension show the user in this case? For example, does the popup window display the text “couldn’t translate to csv” with a “Try Again” button?

User: it needs to identify error state – if there is an issue with the API call (anything other than 200) it should tell the user the specific error. if the image failed to turn into the chart but the API call succeeded, it should say “could not convert this image to a csv” – I think the challenge here is that “try again” implies that the image could be converted to a csv but in many cases it probably cannot be, because users are pasting in bad images.

Gemini: That’s an excellent distinction between an API error and a conversion failure. We will add that specific logic to the spec.

So, when the popup displays either an API error (like a 401 or 500) or the “Could not convert this image to a csv” message, how does the user dismiss it to end the current attempt? Should there be a simple “OK” button, or can they just click away from the popup to close it?

User: Yeah just hit an “X” in the corner. “OK” doesn’t do anything for the user in this case, so its more an FYI

Gemini: Perfect, an “X” to close notifications is clean and simple.

Now, let’s circle back to a key feature you mentioned in the original idea: processing charts. So far, the logic we’ve built is perfect for images of tables.

Does this same process and prompt—”turn the image into a csv”—also apply to images of line charts, bar charts, or scatter plots? Or does the extension need to handle charts differently in some way?

User: yes, the prompt should be the same. if the user wants a more sepcific prompt, they can tune in the settings

Gemini: Excellent, that simplifies the core logic. The extension will use the same process for all images and rely on the model’s intelligence and the user’s custom prompt for specific needs.

We have now defined the entire end-to-end user flow. Let’s refine one last detail on the settings.

OpenRouter provides access to many different AI models (like GPT-4o, Claude 3, etc.), which have varying costs and performance. Should the extension use a single, hard-coded model for all users, or should there be a dropdown menu in the settings page allowing the user to choose which model they want to use?

User: it should be set by default to gemini 2.5 pro, but allow the user to pass a model string from open router if they want.

Gemini: Got it. We’ll set a sensible default model but give the user full control to override it. That’s a perfect balance.

I believe we have now covered the entire specification from the first click to the final output, including settings and error handling. I will now compile everything into a final summary that you can hand off

Editor’s Note: Removed the plain text spec for brevity, as its included in XML below

This completes the specification. Does this detailed summary accurately reflect your vision for the extension?

User: give me this spec a xml please

Gemini: Of course. Here is the complete specification formatted as XML.

<?xml version="1.0" encoding="UTF-8"?>
<specification project="csv everything">
    <description>A Chrome Extension to convert images of tables or charts into a downloadable CSV file using the OpenRouter API.</description>

    <coreFunctionality>
        <trigger>User clicks the extension icon in the Chrome toolbar.</trigger>
        <inputSource>
            <source>Image from the user's clipboard.</source>
            <logic>
                <case condition="Image found">The process begins.</case>
                <case condition="Image not found">An error is shown (see Error Handling).</case>
            </logic>
        </inputSource>
        <processing>
            <api>OpenRouter</api>
            <details>The extension sends the clipboard image and a prompt to the API using the user's configured credentials.</details>
        </processing>
        <successfulOutput>
            <action>Triggers the browser's native "Save As" dialog.</action>
            <defaultFilename>data.csv</defaultFilename>
        </successfulOutput>
    </coreFunctionality>

    <userInterface>
        <state type="in-progress">
            <indicator>A loading spinner that visually mimics Chrome's native download icon.</indicator>
            <description>Displayed in the extension popup while waiting for the API response.</description>
        </state>
        <state type="notification">
            <dismissal>User clicks an "X" icon in the corner of the message box.</dismissal>
            <description>Applies to all error and informational messages.</description>
        </state>
    </userInterface>

    <settingsPage>
        <description>A standard options page for user configuration.</description>
        <fields>
            <field name="OpenRouter API Key">
                <type>Text Input</type>
                <required>true</required>
                <purpose>To securely store the user's API key.</purpose>
            </field>
            <field name="Model String">
                <type>Text Input</type>
                <required>false</required>
                <purpose>Allows the user to specify which OpenRouter model to use.</purpose>
                <defaultValue>google/gemini-2.5-pro</defaultValue>
            </field>
            <field name="Custom Prompt">
                <type>Text Area</type>
                <required>false</required>
                <purpose>Allows the user to define the prompt sent to the model. Used for all image types (tables, charts, etc.).</purpose>
                <defaultValue>turn the image into a csv</defaultValue>
            </field>
        </fields>
    </settingsPage>

    <errorHandling>
        <case condition="No Image in Clipboard">
            <message>Copy an image of the data first</message>
        </case>
        <case condition="Missing API Key">
            <message>API key not found, please add it in settings</message>
            <action>Provide a button that directs the user to the settings page.</action>
        </case>
        <case condition="API Call Failure">
            <trigger>API responds with a non-200 status code (e.g., 401, 500).</trigger>
            <message>Display the specific error returned by the API.</message>
        </case>
        <case condition="Conversion Failure">
            <trigger>API responds with a 200 status code, but the model could not extract data.</trigger>
            <message>Could not convert this image to a csv</message>
        </case>
    </errorHandling>

</specification>

From here I took that spec into an empty directory with VS Code and Amp, and they took it from there. I did end up modifying it so you could run it in the background, but otherwise the initial spec was bang on. You can take a look at the repo here: CSV Everything.

Interactions with AmpCode

The interactions with Amp once I had the spec were fairly trivial. Here are my iterative prompts with Amp, after the initial spec. My specific prompts are always in quotes and my commentary is unquoted, as such you will see some typos in my prompt.

“Next we need to build and test. how do i package it so i can load it in my chrome for testing?” (Once I had this answer, I immediately began testing this locally, and all the questions below all follow that line of thought)
I noticed it was building an icon in png, so I interrupted and said “Lets use SVG just so we can test”
“change the icon to be the text “CSV””
“in my testing, the response from gemini comes in markdown ““`csv <text> “` the markdown formatting shouldn’t be passed to the csv file that is created, so please strip that away. Also, the icon isn’t loading. I think we do need to render the png.”
I noticed it was using the system python to generate an icon so I stopped it and said “use uv instead”
“lets change the Icon to be bold, black text on a transparent background.”
“Ok, so when I click off the extension or change tabs, the api call is interuptted. Is there a way to make it run in the background once the conversion is started?”
“when its running, is there any way in indicate the extension icon is doing something to the user? like a little blue dot or something”
“I tried to use Gemini Flash, and it failed. Is that because 2.5 pro is a reasoning model and Flash is not?”
“Hmm, I’m not getting good enough error messages. Pro works, but flash doesn’t. I can see the API calls making their way to open router, but the response isn’t coming showing up. We should log the entire response from openrouter when we are in “debug mode”, which is a flag in the settings (enable debug mode : true/false), even if it is invalid.”
“seeing this error: Background conversion error: TypeError: URL.createObjectURL is not a function” (note: I also included a screenshot of the error)
“ok the indicator for when its running is way too big. It should a small blue dot in the top right of the icon area. 1/4 the size of current indicator.”
“change the text to a “down arrow””
“Ok so i am working on publishing to Chrome, and it is asking me why i need the “Host Permission” – can we rework this to work without that permission? If not, why?”

And here is a couple of demos!

Conclusion

As you can see you, it is fairly simple to get started with these tools but there is a lot of depth in How you use them. I hope this is helpful glance into the current state of how I am using them and that you find this type of journaling useful.