You can and should use the capabilities of the thunk application model to craft a high-reliability AI automation. This does not have to happen all at one shot. Typically, you start with a simple thunk definition and gradually refine it in a test-iterate cycle.
The guidelines described below are also used by the intelligent capabilities of the Thunk.AI design environment to make suggestions for improvements that help reliability.
What's in your control as a thunk designer
You can achieve higher reliability by using the application model to follow three key principles: minimize granularity (break logic, and state into the smallest meaningful units), maximize constraints (on state and schema), and minimize capability (minimize the tools and output state properties for each step).
The primary control occurs at the AI Instructions for each step of the workflow. These include:
Directions / prompts:
Less ambiguity and more detail help,
Bulleted lists of instructions encourage sequential instruction following
Input properties:
Focuses the AI on specific input data. These get prominence in the AI agent’s “attention”.
The names and descriptions of these fields (in the state schema) play an important role in describing the meaning of the data to the AI
Output properties:
Focuses the AI on writing out specific results. Recorded properties can become input properties for subsequent steps
Names, descriptions, types, and type constraints are very important in steering the AI
The “Extra Properties” escape hatch can be used for ad-hoc outputs, but should not replace expected properties.
Tools:
Focuses the AI on the set of available things it can do. It can ONLY make tool calls. Nothing else.
Disabling and constraining tools increases AI focus and improves reliability
Behavior is also influenced via thunk and step-level settings:
Content folders (generate search_documents tool) and Tool Libraries
AI model to choose – eg: gpt-4.1-mini (faster, cheaper) vs gpt-4.1 (higher quality, but slower and expensive)
H-I-T-L requirements
Various levels of checking (security, compliance, etc) – use only for refinement after initial development and testing
Finally, iterative testing in the development environment helps guide and improve the use of these options.
Define a test set with expected results. Use it to repeatedly test the reliability of your thunk
Provide good/bad feedback on individual runs. This information guides the suggested improvements that the design environment provides.
Once the reliability is adequate on the test set, lockdown the tools to prevent unnecessary tools that can randomize the AI agent decisions.
Runtime mechanisms that improve AI reliability
There are many mechanisms, small and large, that are implemented into the Thunk.AI execution runtime. Collectively, these mechanisms constitute the AI Guardian control environment.
Many of these mechanisms function automatically without requiring explicit control from the thunk designer, while others are enabled or disabled via configuration options. This section describes the most important and impactful mechanisms of the AI Guardian:
Strictly constrained LLM responses
When the AI agent invokes the LLM, it only allows the LLM to respond by invoking one of a small number of strictly structured tools. This "strict tool calling" approach ensures that the LLM does not provide unexpected responses. Further, the tool descriptions and constraints steer the LLM into a meaningful choice of the right tool. One auto-created tool is used to update the output state properties. As these properties are also strongly typed, this provides a very strong constraint that steers the LLM towards recording specific results for each AI instruction step.
Comprehensive validation
All inputs and outputs are potential sources of error. Inputs to the workflow may not be valid. Inputs provided by the LLM to invoke a tool may not be valid. The results of a tool may not be valid. In each case, there are different kinds of validation:
Validation against strongly-typed schema
Validation against natural language descriptions (descriptions of state properties or descriptions of tool constraints created by the thunk designer)
Dynamic reasoning mechanisms when flexible decision making is needed
In situations where there are a range of possible workflow inputs and a complex standard operating procedure to handle these different use cases, it may not be possible to pre-define a very specific sequence of instructions to follow for all situations. Instead:
Dynamic retrieval of instructions allows an SOP document to act as the reference for AI instructions
Dynamic planning allows for a specific execution plan to be developed for one particular step execution
Reflection allows for the agent to note its progress and stay on track to achieve the desired outcome
Consistent responses when information is missing or inconsistent
LLMs may have a tendency to respond to a request with an effort to meet expectations even sometimes when they lack clear information to do so. This manifests in hallucinations or low-confidence conclusions. In order to address this,
Explanations including the source of information are required on every tool call
Defaults both for output state and for tools steer towards semantically meaningful values when information is not known, instead of a low-confidence hallucinated value
Specialize for different high-quality "frontier" LLM models
There are many high-quality LLM models from providers like OpenAI and Google. They vary in their behavior on various dimensions -- cost, latency, and quality. For specific problems, one model may be a better fit than another. The Thunk.AI platform supports many of these models and adapts to the ideosyncracies and unique advantages of each model
Each AI instruction in a thunk can optionally choose a different AI model without any change to the rest of the logic
The platform utilizes the built-in capabilities of each model -- for example, Google's Gemini model can use built-in web search tools, OpenAI's GPT models can handle multimedia document attachments, OpenAI defines a "model spec" for GPT models that ensures close alignment with its tuning principles (thereby improving reliability and security).
