Agentic workflows for non-coding tasks

Writing code with AI is definitely cool and fun, but it doesn’t really impress me. After all, not having the code written automatically before is a lack of tooling, not a core constraint — there is nothing sacred in coding.

What fascinates me — other “human processes”, which require a lot of effort, a lot of brain power, but are often less visible than coding or meetings.

I like to apply the power of AI where the process itself is non-deterministic (unlike coding) — and leverage the nature of LLM to make the process better. And here I’m going to share with you how I do it.

Understanding the flow

To automate something, we need to first understand what it takes to accomplish it without automation. If you know how, step by step, you will execute the task until its completion, you have a solid base for the automation.

But if you don’t understand what to do, if you don’t understand how to do it, and you don’t see the goal of the process clearly — AI is no help to you; there is no miracle.

It will, of course, execute the task with full engagement, but you will soon find out that the result is below your expectations.

My recent case — as a consultant, I often help clients make data-driven decisions. In particular, selecting the best one of multiple options. It can be tool or service selection, build vs. buy, self-hosted vs. managed, or more often — a combination of all. Usually, such a multidimensional comparison requires a lot of very boring, yet energy-consuming work: researching all available sources, normalizing the data, formalizing requirements, doing multiple rounds of evaluation, etc. It, of course, involves follow-ups with the client to clarify details and refine requirements.

Sounds easy, but it’s not — there is huge flexibility to this process. And you often feel unsure about the final result — “What if I missed important information? What if I forgot to validate another view? What if the breadth of research was not enough?” And on top, it’s just very time-consuming.

So let’s automate!

Start from the process — not skills

There is a huge temptation to just ask AI to build everything — and to be honest, it gets more and more capable of doing a decent job from the first shot. But it’s still not there yet to make it like that for complex routines.

So we need to start from the process itself — and try to build a deterministic process map, simply depicting all the steps and logical branches. What you get as an outcome is basically a flowchart with a rich context.

Pen and paper work the best here, but nothing is stopping you from doing the same exercise with your favorite coding agent. Especially where the process is still blurry for you — the agent will help you to navigate through all uncertainties. Just don’t forget to validate everything it says — the machine knows how to sound relatable.

Design First

As with agentic engineering, we don’t jump directly into implementation — we start with architecture and specifications. Well, like the real engineers.

This case is no different — we first want to design the steps of the process, draw the boundaries, complete transitions, and only then proceed with implementation.

In this case, I design the process as consisting of 4 main phases:

  1. framing
  2. solutioning
  3. evaluation
  4. synthesis

Each consists of many smaller steps, some sequential, some can be done in parallel.

Next, I want to ensure the flow stays predictable, not a hallucination.

Harness the Djinn

While building a multi-step agentic workflow, the biggest danger is the compounding error. Naturally, LLM output is only partially correct, and we cannot expect it to be 100% true. But if we have a % of error in each step, this % starts to compound with each new step taken.

For example, in a 4-phase process, where each phase consists of 5 steps, given an error rate of 5%, we have

$$ Correctness = (1-0.05)^{4*5} = 0.3585 $$

Thus, the result will be only 35.85% correct — far below the expectations.

To counter it, we want our process to be as deterministic as possible, and therefore verifiable once the step is done.

To achieve it, I always try to put every step as a step-by-step description with very predictable options. Imagine a call center — the support staff usually works with scripts — very straightforward predefined lists of questions and reactions, so even a person with no experience whatsoever can follow them with no mistake.

This is exactly what we are trying to achieve — so even without a comprehensive context it is clear for the agent what to do and when.

To do it, I like to search if there are any existing frameworks and methods that exist to accomplish my task. If so, there is a high chance that they are known to the model already, so it will follow the general guidance. But if there is no such method available, I will develop and document one myself.

And here we are already starting the implementation part.

Wrap it into skills

The skills are the building blocks of a modern agentic pipeline. Each skill is loaded into the agent context, while context is the most scarce resource of the AI age — it must stay clean and focused on the task. This is why we can’t simply put everything in one huge skill — it will lead to context rot and much lower result quality.

Instead, I usually create a bunch of skills — one per each activity/step + one orchestrator skill on top. The latter one is not required — I just use it as a single entry point to simplify my work.

The orchestrator skill owns the whole pipeline logic — what comes first, what follows. It evaluates the current state of the process and selects the step to trigger next.

State management

As you may guess, such a workflow normally takes some time and resources to execute, and you cannot physically squeeze it into a single context. That’s why you need to manage the state externally — it is usually done by having everything important saved as files to the filesystem at the end of each step.

The data model is a part of the overall workflow architecture. For me, the combination of Markdown for step results and jsonl for raw data works the best.

A single or a set of Markdown documents, generated during the step execution, feed the next step context. Raw data are used both for the step execution and the following validation (if required).

To work with the data, I prefer to write scripts, executed by the agent, so that all data-related work is 100% deterministic. And it also saves tokens, because the scripts simply run on your machine (or cloud). Each script is owned by a certain skill and lives within this skill directory.

Give it a personality

Once we are done with skills and scripts, the next step is the workflow orchestration. We can do it manually, of course, but if we have an orchestrator skill, it can check the stored process state and trigger the next step, and so on until we are done.

But we can go further. We don’t need to use the same model for all of the steps. Some are mechanical, and a cheap model is enough, while some are analytical and require thinking models. The highest demand is from synthesis models — to interpret all the data the right way and make the sane assumptions, you need the best model available.

To make it easy, I wrap similar activities into agents — one for research, one for option comparison, one for scoring, one for synthesis, and so on. Each agent has personality built in (what it does and how, which skills and tools it can use) and the model to power its work. If you are not sure which model suits better to which job — ask your coding agent to help.

Verification and troubleshooting

It’s not enough just to run all the steps and hope the result will be top-notch. This is not how LLMs work. Instead, we need to build quality gates.

The simplest way is to have a human in the loop, i.e. to validate the results by yourself once the step is done. However, it is time-consuming and not always needed. So you can try to delegate this to the machine as well — partially, at least.

Develop another step (a skill!) for verification, and make it as deterministic as possible — you already have the raw data stored, so it can double-check. It’s better to use a separated context for verification; thus no context poisoning will happen. Wrap it in another agent — subagents always run in an isolated session with a clear context.

Add to the verificator a confidence score — if it’s sure enough in the result correctness, it may proceed with the next step in auto mode. Otherwise — human review is required.

The same works for troubleshooting — if you observe your results to be incorrect, you have a time machine — a trace of data from all the steps, so it wouldn’t be difficult to locate and fix the error, rerunning the flow from there.

Making most of it

This was the first step — same as the software, skill development happens iteratively.

Test your workflow, find weak points, improve, iterate.

Refine the skills, keep them lean. Use skill-creator to follow the best practices, read the guide to improve the skill design.

Workflows are the backbone of company process automation. It is not a replacement for a human workforce — but a powerful enabler and multiplicator. Use it as an addition to your own expertise to release your capacity from routine work and focus your attention on what matters the most for your goals.

comments