Shipping Enterprise Thunks: Testing and Quality Principles

01 — Treat every thunk as a software project Every thunk has its own unique logic, expected behavior, and quality requirements. Even though it is expressed in natural language and requires no traditional coding, it is still a software application — just built on a different platform. Building and maintaining it is best approached as a disciplined engineering project.

What to do: Define requirements before building. Track changes and bug fixes in an issue system. Treat upgrades and regressions the same way you would in any software project.

02 — Build automated tests from day one The thunk is run against a test input and its output is automatically compared to a pre-defined expected result. No human reviews every test manually — results are reviewed in aggregate and mismatches are investigated. Automated testing gives you a repeatable measure of quality and catches regressions before users do. Two laws always hold: low test coverage leads to a low-quality thunk, and low-quality test data leads to a low-quality thunk.

What to do: Create input/output test pairs before you ship. Add a new test every time you encounter a new use case or a reported issue.

03 — Use real, representative test data Test inputs must reflect actual production use cases. Expected outputs must exactly match what the thunk is configured to produce — in format, structure, and content. If the thunk uses constrained value sets, the expected outputs must use those exact values. Poor test data produces poor thunks. User feedback is not a test, but it should always be converted into one.

What to do: Before adding an expected output to the test folder, ask: does this exactly reflect what we have told the thunk to produce? If not, fix it first.

04 — Assign a clear owner for each testing role Three roles must all be active — a gap in any one causes quality to slip. Business owners collect and provide tests that cover the expected use cases. Thunk builders ensure the thunk passes those tests and fix failures. The platform team runs tests on a regular automated schedule and shares results with everyone.

What to do: At the start of every thunk project, explicitly name who fills each of the three roles. Do not leave any unassigned.

05 — Store tests in a structured folder Ad hoc test collection — files in email, screenshots, inputs without expected outputs — does not produce automatable tests. Each thunk should have a dedicated folder. Each subfolder holds one input file and one expected output file. Business owners are responsible for ensuring those expected outputs meet the quality bar. The more high-quality tests you add, the more confidence you have in shipping.

What to do: Create the test folder before you write the first version of your thunk. Turn every user-reported issue into a test case.

06 — Use test data to decide when to ship Before going to production, assess three things: test coverage (what fraction of expected use cases are tested), automated pass rate (how often the thunk produces the right output), and risk/reward (is the value of shipping now greater than the cost of known gaps). Low coverage combined with poor test data makes this decision impossible to get right.

What to do: Before any production deployment, document your coverage, pass rate, and known gaps. Share it with stakeholders and make the go/no-go call together — informed by data, not gut feel.

Following these principles will make your teams faster, produce higher-quality results, and give everyone more confidence in what is being shipped. The Thunk.AI platform is built to support you every step of this journey.

AI Reliability: Concepts and Principles

Understanding LLM costs and latency

AI reliability mechanisms

Custom Tools