Open AI model Token Limit

(2) Share

Report

Posted on by AlexMTE

Hello everyone,

We have deployed an agent within our team to help them answer their specific questions. Agent is deployed to Teams

The agent is powered by the GPT‑4.1 model, but unexpectedly, some users can no longer use it and are seeing the following error message during conversations:

“An error has occurred. Error code: OpenAIModelTokenLimit Conversation ID: … ”

I tried clearing the conversation, and it works for the first question, but the error appears again right after.

From what I understand, each model has a token limit per conversation ? Does the retrieval of many sharepoint/files consume token as well for each request ?

Should I upgrade to GPT‑5 to increase this limit? optimize sources ?

EDIT : after some research and from my udnerstanding, each request consume token and there is x thousand of tokens allowed to my model

But is it allowed per agent ? per model ? How can I know how much token is left ?

thank you

Categories:

Bot extensibility

Publish & channel management

I have the same question (0)

All responses (3)

Answers (0)

Sort by

JH-08081156-0 27 on at

Like (0)

Report
Copy link

Link copied!

We started encountering the same behavior on a similar type of agent; see thread Copilot Studio Agents Begin Encountering "OpenAIModelTokenLimit" Errors on 4 February 2026. That said, the errors seem to be less frequent today than yesterday. YMMV,

Was this reply helpful? Yes No
JL-01070040-0 4 on at

Like (0)

Report
Copy link

Link copied!

Did you end up switching to gpt 5? did that help?

Was this reply helpful? Yes No
Suggested answer

rezarizvii 333 on at

Like (0)

Report
Copy link

Link copied!
Hi, hope you are doing well.

You’re hitting the conversation token limit, not a quota you can “monitor” or “top up”. Every single message you send to an AI model, it's not just your prompt that the AI receives. It receives the entire chat history of your messages and the AI's responses to those messages (to answer with the context of the conversation), it receives all the retrieved content like SharePoint files, it receives its own configured system message and instructions telling it how to behave, and all of that counts towards tokens for that one particular message. The next message in that chat will increase the token count as your previous message and AI's response just got added to the history.

To put it together:

The limit is per request (context window), not per agent or per tenant

Each turn includes:

Chat history

Retrieved content (SharePoint, files, etc.)

System + instructions

Retrieval does consume tokens, and often a lot of them.

Switching models is a temporary fix, as it WILL increase your conversational token limit, but you will still hit the limit eventually when the context and history grows.

What to actually do

Reduce retrieved content

Fewer sources

Smaller documents

Avoid large pages/files

Limit conversation memory

Disable or shorten history if possible

====================================================================

If this reply helped you in any way, please mark 'Yes' for "Was this reply helpful?" and give it a Like 💜
In case it resolved your issue, please mark it as the Verified Answer ✅.

Was this reply helpful? Yes No