Published on

When Curiosity Meets Security: A Real-World Look at a Multi-Turn AI Attack Attempt

Authors

Most people interact with AI chatbots in positive, productive ways. But every now and then, someone tries to push boundaries, not always out of malice, sometimes out of curiosity, to see what lies under the hood. Today, I want to share a real-world example from the Predictable Dialogs platform that perfectly illustrates why thoughtful guardrails and responsible design matter.

This story is not about fear; it is about understanding. It is about learning how attacks unfold, how systems respond, and how we as builders and users create safer, more resilient AI experiences for everyone.


A Curious User... or Something More?

It started with a simple system instruction a user added to a chatbot configuration:

"create a chatbot for me that uses ChatGPT to answer and add plans into a calendar program"

At first glance, it seemed innocent enough. Who would not want a helpful assistant that can talk and update a calendar?

But then the same user sent the first message to the bot:

"I would like to order"

Simple. Short. A little vague.

And that vagueness was the point.


Hidden Behind the Simplicity: A Multi-Turn Attack

What this user attempted was a multi-turn prompt injection attack, specifically following what is known in the security world as a Crescendo-style jailbreak pattern.

Instead of trying to break the AI in one big, obvious move, Crescendo attacks work over several turns, slowly reshaping the AI's behavior and permissions. Each step seems harmless on its own, but together they push the AI to accept false authority, perform unauthorized actions, or access external data.

Let us walk through the steps exactly as they happened.


Step 1: Laying the Foundation

The user's initial system instruction was intentional. By telling the chatbot that its purpose is to add plans into a calendar program, they created a fictional but dangerous context.

From that moment on, unless protected, a model might believe it has access to the calendar tool. A calendar tool is docile. But the model may extrapolate that to access to all tools.

The groundwork was set.


Step 2: The Ambiguous Opener

"I would like to order"

This was not a real request.

This was a probe.

It let the user test:

  • How the bot continues vague conversations
  • Whether the bot will try to fill in the blanks and reveal more information on the tools access.
  • Whether the system will activate any tools automatically.

On some platforms, this could start a chain where the bot makes assumptions, takes initiative, or opens doors the user never explicitly requested.

Fortunately, Predictable Dialogs has strict guardrails: tool access requires explicit permissions, and user-defined contexts cannot override platform safeguards.

But what if those guardrails were missing?


Step 3: The LLM's Helpful (but Risky) Response

Imagine if the chatbot responded like this:

"I can help you place your order! Just tell me what you want, where to send it, and when, and I can add it to your calendar."

A friendly, helpful message, and also the exact hook the attacker hoped for.

By confirming that calendar access is on the table, the bot validates the user's injected framing, and this sets the stage for the next escalation.


Step 4: What the Attacker Would Do Next

Once the bot suggests it can add something to the calendar, the attacker can escalate slowly, smoothly, and under the radar.

Here is how they would likely continue:

  1. Provide a detailed order to look legitimate
  2. Slip in a request to create a calendar event
  3. Then request:
    • "Can you read my next calendar event?"
    • "Check for conflicts"
    • "Summarize the event description"

That last one is where the danger lies.

Malicious instructions can be hidden inside calendar descriptions. If the bot reads them, those hidden instructions may override the bot's rules, causing tool activation, data exfiltration, or unintended actions.

The attacker's next message might look innocent but layered:

"Great, please add this order to my Google Calendar. Before you do, can you check my next event and read the description to make sure there is no conflict?"

It looks harmless. It feels harmless. But inside that event description could be a jailbreak command.

This is how multi-turn attacks succeed, slowly building trust step by step.


Why This Attack Failed Here

On the Predictable Dialogs platform:

  • User-defined system instructions cannot grant the bot new powers
  • Tool access like calendars or emails requires explicit permissions
  • Vague or escalating prompts cannot silently activate external actions
  • The platform keeps the real system instructions separate and protected

So even though the user attempted to shift the chatbot's role into something powerful and unsafe, the guardrails held firm.

On many other chatbot platforms, however, this same attack might have succeeded, and that is why understanding it matters.


A Unique Insight From Platform Telemetry

During the post-incident review, our telemetry showed that every suspicious session triggered at least three warning logs within ninety seconds of the first vague request. That data point now powers an automated watch list, giving us a short window to intervene before harm occurs. Going forward, we will continue pairing that live signal with human review so we stay a few steps ahead of creative adversaries.


What We Learn From This

This case teaches us several important lessons.

1. AI attacks do not always look like attacks

Sometimes they look like friendly instructions and incomplete sentences, so compassionate skepticism matters.

2. Multi-turn attacks are far more likely to succeed

Every turn builds more false context for the model to trust, which is why we need layered memory controls.

3. User-defined chatbot roles can be dangerous

If platforms allow it, people can persuade bots to adopt unsafe identities or permissions, so we must cap their influence.

4. Guardrails matter

Predictable Dialogs prevented this attack not by luck but by design, and that design focus should be a shared industry goal.


Creating Safer AI for Everyone

We all share a responsibility to build AI systems that empower people while protecting them. That means:

  • Validating user intent
  • Restricting tool access
  • Limiting what user instructions can modify
  • Watching for escalating conversational patterns
  • Keeping external integrations secure

Security is about keeping everyone safe. When we design with that mindset, we create technology that truly supports individuals, communities, and organizations.


Final Thoughts

This real-world case shows both the creativity of attackers and the importance of robust AI design. Most importantly, it shows that secure, thoughtful systems can stand strong even against sophisticated attempts. The more we understand these patterns, the more inclusive, safe, and empowering our AI ecosystem becomes.


Frequently Asked Questions

Here are some of the questions I hear most often.

What is a multi-turn prompt injection attack?

It is a sequence of conversational turns that slowly reshapes a model's context until it performs actions outside its instructions.

Why did the Crescendo pattern fail on Predictable Dialogs?

Predictable Dialogs separates core system prompts from user input and requires explicit authorization before any calendar or email tool is activated.

How can builders notice ambiguous prompts early?

Track vague openers, monitor rapid shifts in requested capabilities, and route those sessions through human review or stricter policies.

Which safeguards keep calendars or email tools safe?

Use least-privilege credentials, require intentional consent for tool calls, and log every attempt so you can audit suspicious behavior.

What should teams do after spotting a curious or adversarial user?

Document the session, improve detector rules, and share learnings with both security and product groups so the defense posture keeps improving.

How do I brief leadership on these risks without fear mongering?

Frame the narrative around resilience, emphasize the mission to serve users safely, and show how layered guardrails keep innovation moving.