Skip to main content

Secure AI Agents

Understand how Thunk.AI address the unique security issues related to AI

Praveen Seshadri avatar
Written by Praveen Seshadri
Updated yesterday

The use of AI in an enterprise workflow introduces unique security challenges beyond those of traditional software systems. The Thunk.AI platform has taken extra security measures designed to address these AI-specific concerns, ensuring the safety and integrity of your data while harnessing the full potential of AI automation.

Categories of AI-specific security concerns

There are three broad classes of AI-specific security concerns:

  1. Can the AI models learn from our data and usage?

  2. Can end-users (typically, parties who are external to the organization that deploys the thunk workflow) or external sources of data manipulate the AI agents into taking inappropriate actions?

  3. Can participants of a thunk (typically, employees of the same organization that deploys the thunk workflow) manipulate the AI agents into taking inappropriate actions?

The third class of concern is of lower priority than the first two, and because we address it explicitly in the article about authentication, we will not repeat that information here.

AI Agents vs AI Control Sandbox

An earlier article described how AI agents in Thunk.AI run within a control sandbox environment. This is very important to understand when considering user identity. To briefly summarize, every AI agent execution in Thunk.AI is implemented with two components: (a) the AI agent that utilizes a large language model (LLM), and (b) a sandboxed execution environment that controls and manages the AI agent.

The important takeaways are:

  • The AI agent never accesses data directly or takes actions directly. Any information needed by the AI agent is provided by the control sandbox, either as part of setting up the initial environment or by responding to tool call requests.

  • The AI agent never updates data directly. Any such changes are made only by the control sandbox.

Can AI Models learn from the workflow data and usage?

Thunk.AI does not build or operate its own AI models (LLMs). Rather, it invokes appropriate configured LLMs via an API service. The default LLM used in the public version of Thunk.AI is the GPT 4.1 model from OpenAI. The Thunk.AI platform has access to this model via an enterprise agreement with OpenAI, wherein OpenAI contractually commits not to learn from the data or usage.

All the same, it is a real concern among enterprise customers that an LLM provider (like OpenAI) might possibly learn from the data and usage of AI agents that run on the Thunk.AI platform.

To address this, the Thunk.AI platform supports the Bring-Your-Own-LLM (BYOL) option. If you have pre-existing direct agreements with any of the major LLM providers, you can extend the same contractual safeguards and compliance measures to the data processed through the Thunk.AI platform. To do so, register your own LLM API key as part of your Thunk.AI account.

Enterprise customers operating within a private instance can even choose to utilize entirely locally-hosted LLM models, so that there is an additional network security guarantee that ensures that no data or usage information can physically be exfiltrated.

Can AI Agents be maliciously manipulated?

A key concern with AI agents is the potential for them to be misled or manipulated into performing unintended actions, similar to how a human might be manipulated. The AI agents in a workflow application are supposed to follow the instructions of the thunk owner/admins. In most cases, the workflow represents some internal process of an organization. It is not supposed to receive or follow instructions from the data inputs of individual workflow items, as these inputs arrive from untrusted external parties.

The most common form of manipulation is via "prompt injection". This is essentially a term used when an unauthorized party can provide instructions that the AI agent mistakenly follows. Once the AI agent can be manipulated into following such malicious instructions, the effect might be the exfiltration of data and worse.

There are two sources of untrusted inputs to the AI agent:

  • the input data for the workflow, and

  • any data collected from external applications, files, web pages, etc, that are made accessible to the AI agent.

Both of these are potential attack vectors. The Thunk.AI platform has four mechanisms to detect and mitigate malicious manipulation:

  1. Input and output sanitization

  2. Tightly-constrained and isolated execution environment

  3. Adherence to the security mechanisms of the underlying LLMs

  4. Traceability and auditability mechanisms

Input and Output Verification: All inputs to the AI agents undergo checking and sanitization for conformance to expectations and anomaly detection. Likewise, the platform employs a secondary AI model as a “checker” to evaluate the sanity and appropriateness of outputs generated by the primary AI agent. This additional layer of scrutiny is essential in catching subtle anomalies or potentially harmful content that traditional rule-based systems might miss.

Constrained and Isolated Execution Environment: AI agents run within a controlled "sandbox" environment. They cannot directly take any actions or change anything within the Thunk system or affect the external world. All they can do is respond to instruction by requesting the sandbox environment to execute one of a predefined small set of tools.

The tool calls in the sandbox run with the exact same credentials as the user who is configured to use them. This means each agent can only invoke the constrained set of tools that that its human counterpart is authorized to do in that specific situation. Even if an AI agent is somehow manipulated into attempting an unauthorized action, it will fail due to lack of necessary permissions. There is no way for an AI agent to enter a more privileged mode or escalate its permissions beyond those of the invoking user.

Each valid tool call request interaction also undergoes thorough validation to ensure that it aligns with expected behaviors and authorized actions. This validation applies to all operations, regardless of whether they affect Thunk’s internal state or reach out to external systems. The sandbox environment executes the tool call, checks the result to again determine if there are any anomalies or malicious intent, and only then returns the results to the AI agent and its LLM.

Each AI agent workflow task runs within its sandbox environment in an isolated “job” in the Thunk.AI platform’s Agent service. Because workflows can be long-running, state has to be persisted and maintained across steps of a workflow. The context for this job includes the user identity and access to the workflow's State Store (via GraphQL) based on the user's identity. Because this access is associated with the user's identity, the access control mechanisms enforced by the GraphQL service act as the fundamental isolation mechanism to silo user sessions and mitigate any attempts to maliciously access another user's data.

Complete Action Traceability: Every tool call and every interaction between the AI agent and the sandbox environment is meticulously logged, creating a comprehensive auditable trail of all AI agent activities. This ensures that every action taken by an AI agent is traceable and accountable.

Alignment with LLM security mechanisms:

Most mainstream AI models like OpenAI's GPT-4.1 have invested significant effort in ensuring that security vulnerabilities like prompt injection can be mitigated directly within the model. In order to do so, they publish a "model spec" that publicly describes the way their model is constrained to behave acceptably despite potentially malicious inputs. That is only possible if the code invoking the LLM strictly follows a number of design principles. For example, any untrusted content must be appropriately identified, its content escaped to avoid obvious injection attack vectors, and the system instructions provided must avoid unintentionally opening up loopholes. Thunk.AI does not expect thunk designers and admins to be experts in these arcane details, but the platform takes care of mapping the thunk definitions into the appropriate representations needed to align with the LLM's model spec. Thereby, there is "defense in depth" and the application can benefit from the LLM's inherent mechanisms to avoid malicious injection attacks.

Did this answer your question?