Skip to main content
All CollectionsCore Concepts
What can an AI agent do?
What can an AI agent do?

Understand the basic capabilities of a Thunk.AI agent

Praveen Seshadri avatar
Written by Praveen Seshadri
Updated over 6 months ago

There are four questions rolled into this one topic.

  • What core capabilities does the AI agent have?

  • What can you teach it to do or steer it to do better?

  • What can the AI agent not do?

  • What can it be trusted to do?

Let's answer each of these questions.

What core capabilities does the AI agent have?

The underlying AI model (GPT-4o from OpenAI) is a multi-modal foundation model and many of its capabilities are exposed through Thunk.AI. These include:

  • The ability to have contextual conversational interaction.

  • "Knowledge" of a broad swatch of information on the web (though usually a few months stale).

  • The ability to generate images and extract information from images.

  • The ability to generate documents and extract information from documents.

To effectively channel these "raw" capabilities, Thunk.AI constructs the right prompts and examples to achieve much more specialized capabilities for your AI agents in a thunk:

  • The ability to do project planning -- these plans have to conform to a stylized form that the Thunk.AI automation engine can reason about and automate.

  • The ability to record semi-structured information and then utilize and update it -- unlike a simple conversational chatbot like ChatGPT, every thunk keep a record of semi-structured information reflecting the current state of the project.

  • The ability to validate data to ensure that it conforms to desired policies and constraints

  • The ability to compute new data values based on natural language formulas

  • The ability to route messages to data entries by matching the message content to the data content.

How can you extend what the AI agent can do?

The AI agent enhances the capabilities of the GPT AI model by giving it "tools" --- these are capabilities to connect with external systems. The Thunk.AI platform provides builtin tools to read and write files from Google Drive, to send and receive email messages, to search the web, and to read any web page. This set of tools is extensible and every user can easily add an array of extra tools to connect with other systems both as inputs to the project data, as outputs for project steps, or as information sources for the AI agent to use.

Remember that these are all just primitive capabilities of the agent. Not only can the AI agent do these individual actions, the real power arises from its ability to stitch together a sequence of actions to achieve a desired work outcome.

The primary way to steer and control the AI agent is of course via the inputs you give it -- the goal and plan instructions you provide, via policies, via the actual content of data, and via conversational inputs.

What can the AI agent not do?

We live in a remarkable era of AI advances. Any definitive answer to this question may not still be valid a year from now. All the same, it is useful to keep in mind that Thunk.AI is a platform for digital work, and therefore actions that are purely in the physical world are not possible (eg: you cannot have it pressure-wash your deck). Even in the digital world, there are a large number of capabilities that the current agent doesn't support.

  • For example, it cannot sing a song (why? because we have not given it the ability to use the speakers on your computer or tested the voice generation capabilities of the AI model -- it didn't seem important for our core initial customer cases).

  • For example, it cannot access your WhatsApp messages (why? because WhatsApp doesn't provide an API interface that allows external software providers to read messages).

  • For example, it cannot interact with an arbitrary application on your PC (why? because there are security constraints that limit the interaction between applications and we have not yet built "native" versions of Thunk.AI that can override these security constraints)

While there are these limitations today, please let us know about those limitations that seem crucial in blocking your use cases, and we'll do what we can to prioritize them.

What can it be trusted to do?

If you are going to use AI for work, you want to have some sense of its reliability. There are three dimensions to reliability:

  1. Does it do the work it was asked to do? The more detailed your instructions are in a thunk, the more likely it is to satisfy this requirement. The Thunk.AI platform does a lot to ensure reliability behind the scenes. For example, it checks and validates every single response from the AI model, sometimes sending it back for rework, without the user having to know or intervene.

  2. Does it do the work repeatably? This is important for recurring processes. We need a combination of consistent process steps with the flexibility to adapt to the particular work item at hand. In every thunk, there is a structured plan that is respected by the AI agent that drives automated work. This plan ensures that processes stay broadly consistent, while the execution of each step can exhibit flexibility.

  3. Does it make up stuff? Language models famously "hallucinate". Put differently, what is viewed as creativity in one context can become hallucination when that creativity is applied to facts. In work settings however, most relevant facts are derived from the work environment rather than from the training set of the AI model. So in practice, we tend to see sporadic hallucinations only in situations where the information provided to the AI agent is grossly inadequate for the work at hand.

Did this answer your question?