---
Title: "Powerless by design"
Url: "https://devrev.ai/blog/powerless-by-design"
Published: "2026-06-10"
Last Updated: "2026-06-10"
Author: "Shlomi Vaknin"
Category: "Blog"
Excerpt: "Pro-Iranian hackers hijacked Instagram accounts by social-engineering Meta’s AI support chatbot – the fix isn’t a smarter bot, it’s an architecture where the AI never had the keys in the first place."
Reading Time: 10
---

# Powerless by design

On June 1, 2026, [Krebs on Security reported](https://krebsonsecurity.com/2026/06/hackers-used-metas-ai-support-bot-to-seize-instagram-accounts/) that pro-Iranian hackers hijacked Instagram accounts, including the Obama White House page, by social-engineering Meta’s AI support chatbot. They didn’t break into any servers or steal anyone’s credentials. They opened a chat, asked for a password reset, and talked the bot into swapping the recovery email to one they controlled. The reset code went to them. That was the whole attack.

> [!INFO]
> The kill chain was five steps:
> 
> 1. Attacker opens support chat (spoofed IP via VPN)
> 2. Attacker requests password reset
> 3. Bot adds attacker’s email as recovery address (no verification gate)
> 4. Bot sends OTP to attacker’s email
> 5. Attacker completes reset, owns account

The attackers themselves noted it failed against accounts with MFA enabled, but it failed by accident. The bot had no enforcement mechanism to require MFA before modifying account state. It just happened that MFA accounts had a different flow the bot couldn’t complete.

When people hear a story like this, the first reaction is usually that the bot was too gullible. It believed whatever it was told, so the fix must be to make it harder to fool.

Add more checks, train it to recognize manipulation, write better instructions telling it not to fall for this sort of thing.

I think that reaction misses the point entirely.

The problem wasn’t that the bot was gullible. Being gullible is fine in something that has no power. The problem was that the bot had direct write access to identity provider state (email addresses, recovery options) with no intermediate verification layer. 

That’s what **“unmediated agent authority”** means in security terms, and it’s the actual vulnerability class here.

Imagine you ran a call center and gave every new hire direct write access to the password database. 

Not a tool that files a request for someone else to review, but the database itself. Anyone who could talk their way past a tired employee would own every account in the company. 

You would never do this, it’s obviously insane. But it’s basically what Meta did, except the employee was a language model, and a language model will believe almost anything if you phrase it with enough confidence.

There’s a reason AI agents are more dangerous here than a web form. When you fill out a form to reset your password, the form has no judgment. 

It collects what you type and hands it to a system that decides what happens next. You can’t persuade a form, because there’s nothing there to persuade. 

An AI agent is different, it decides, and the moment you let a thing decide the conversation itself becomes the authorization. Whatever you can convince it of, it does.

So the fix isn’t a smarter bot, it’s a split. You build the system as two parts that don’t trust each other. One part talks to the user and works out what they want, and the other part actually does things. The part that talks has no power, and the part that has power doesn’t talk.

## The agent files a ticket. That’s all.

We built SDA (Service Desk Automation) to handle things like password resets, MFA resets, and access provisioning through conversational AI. The same use cases Meta was trying to cover. The architectural difference is that the agent has zero write authority over identity systems.

The AI layer can do five things: figure out what you want, gather context, consult the policies that apply to you so it can explain what’s about to happen (“this one needs manager approval, that one doesn’t”), create a work item, and tell you the status.

That’s the entire list. You know what you’re getting into before you commit, and you can decide accordingly. But the agent has no credentials for Okta or Azure AD. 

It has no API that reaches the identity provider. When you ask it to reset your password, the only thing it does is create an issue, same as if you’d filled out a form. It sits above a trust boundary it has no way to cross.

![image](https://cdn.sanity.io/images/umrbtih2/production/68e8f4a0a3728d7cec3423c7a479ea84b11e57bf-1536x1024.png)

Everything that matters happens below that line, in a workflow engine that is deliberately boring and deterministic. It runs in two phases, and the second one only fires if the first one passes.

## Phase one: prove who you are, by possession not by knowledge

Meta’s bot accepted conversational proof. “I’m the owner, here’s my birthday.” Our verification phase ignores everything you told the bot. 

It starts by resolving you from the directory itself, pulling your real attributes and group membership from HRIS, not from anything you claimed in the chat.

Then it verifies you with something you have, not something you know. It pushes a challenge to the device you registered ahead of time. The agent never sees the response, because the response doesn’t travel through the agent. 

It goes straight to the identity provider’s verification endpoint. Someone in a chat window can lie about their birthday all day long. They cannot make a notification appear on someone else’s phone.

**There are two important branches off this.**

If the request is an MFA reset, the device is by definition the thing that’s compromised or lost, so a push won’t work. 

In that case the system falls back to manager attestation. It sends an approval card to your manager via a separate authenticated channel. The requester cannot approve their own request. The approval flow explicitly excludes the requester from the approver list, whether that flow runs through DevRev’s native approvals, Okta, or whatever system the customer prefers. You can’t be your own attester.

And some users always get the strictest path regardless of what the offering-level policy says. 

Users in designated groups (executives, for instance) are forced into the highest verification tier. That decision is made server-side from HRIS group membership, never from anything the requester asserts. A VIP can’t be talked down to an easier verification, and an attacker can’t claim VIP status to change the rules in their favor.

## Authorization is a separate question from authentication

Knowing who you are is not the same as knowing you’re allowed. Even after verification passes, the system checks whether you’re entitled to what you’re requesting. The logic is plain set math against static configuration:

> [!INFO]
> **Evaluated server-side in workflow trigger**
> 
> scope_match = user.groups ∩ entitlement_scope.scope_groups
> 
> role_match = user.roles ∩ entitlement_scope.scope_roles
> 
> if not (scope_match or role_match): → escalate (no entitlement, human review required)

The AI agent cannot override this. The scope objects are static configuration, not LLM-accessible state. There is no prompt injection that grants an entitlement, because the thing that grants entitlements isn’t listening to prompts.

And if simple group-and-role matching isn’t enough, the system supports custom predicates, arbitrary conditions evaluated server-side that can factor in time of day, location, risk score, or whatever else the policy requires. 

The point is the same: the decision logic lives in configuration the agent can’t see or influence.

## Phase two: fulfillment

Only now, with identity verified and entitlement confirmed, does anything get written. The workflow engine resolves the backing system behind the account (Okta, Azure AD, whatever it happens to be) by traversing the links in DevRev’s Computer Memory’s knowledge graph, then calls that system’s API to perform the reset. 

It authenticates with a keyring-managed service account, not with the user’s token and not with anything the agent could ever hold. The blast radius is scoped by the entitlement policy, not by how convincing the conversation was.

## Fail closed: lock first, investigate second

The single most important difference from Meta's bot is what happens on uncertainty. Meta's bot, when it wasn't sure, kept going. In this system, any anomaly during a sensitive flow triggers an immediate, ordered response, for example:

> [!INFO]
> **Anomaly detected →**
> 
> 1. POST /api/v1/users/{id}/lifecycle/suspend (immediate)
> 
> 2. DELETE /api/v1/users/{id}/sessions(terminate all)
> 
> 3. Escalation issue created for security team
> 
> 4. Notification via dedicated escalation channel

The account is locked before a human looks at it, not after. Uncertainty isn’t a reason to proceed carefully, it’s a reason to stop.

## Everything leaves a record that can’t be deleted

Every completed action writes an immutable EntitlementGrant record, a custom object in DevRev Computer Memory that’s queryable, linkable, and non-deletable.

![image](https://cdn.sanity.io/images/umrbtih2/production/ea80c46513f4daf650945dbf19e99bd01bf885f6-1000x480.png)

Compliance can ask “show me every password reset in the last 90 days, who approved each one, and what verification method was used” and get an answer from a single query. Meta’s bot left nothing comparable.

## Replaying the attack

Run the Meta attack against this architecture and watch where it dies. The attacker opens a chat and asks for a reset: the agent creates an issue and nothing in the identity system changes. 

They claim to be the owner: the system ignores the claim and sends an MFA push to the registered device. 

They don’t have the device: the push times out and the flow moves to manager attestation. 

They can’t reach the manager’s approval channel: the approval never arrives and the request expires after TTL. 

They try to talk the bot into skipping verification: the bot has no API to skip anything, because verification is a workflow gate, not a decision the bot is able to make. 

They trip the anomaly detector: the account is suspended at the IdP, all sessions terminated, security team alerted.

The attack surface Meta exposed, conversational manipulation of an agent with direct IdP access, simply does not exist in this architecture. The agent is a fancy form submission interface. The security lives in the workflow layer it cannot reach.

## The general rule

> [!INFO]
> There are plenty of tasks where you want the AI to explore and make judgment calls. Research, summarization, triage, build prototypes. 
> 
> But password resets aren’t one of them. Any task that’s regulated, auditable, or must follow a deterministic flow needs security that lives in structure, not in the model’s behavior. That’s the design principle I take from all this.

This is uncomfortable, because behavior is exactly the thing we’ve been getting good at. We can make models more careful, more aligned, harder to trick. And we should. But every one of those improvements is probabilistic. 

It makes the bad outcome less likely, not impossible. A model that refuses manipulation 999 times out of 1,000 is a model that gets exploited on the thousandth try, and attackers are perfectly happy to try a thousand times.

Structure is a different kind of guarantee. If the AI has literally no way to change a recovery email, it doesn’t matter how cleverly someone asks. 

There’s no sentence that grants a permission the system was never given. You aren’t trusting the model to make a good decision. You’ve arranged things so the decision was never the model’s to make.

And the important thing is you don’t lose anything by doing this. The model is still there, still conversational, still helpful. It explains what’s going to happen, walks you through your options, tells you who needs to approve what. 

The user experience is better than a static form. You just aren’t letting the helpful thing also be the powerful thing.

### Five principles fall out of this:

1. **Least privilege for AI.** The agent can create work items, nothing else. No IdP credentials, no direct API access to identity systems.
2. **Verification through possession, not knowledge.** Device-bound challenges, not security questions. Knowledge can be social-engineered; device possession cannot be faked remotely.
3. **Separation of channels.** The request channel (chat) ≠ the verification channel (push notification) ≠ the approval channel (manager’s authenticated session). Compromising one doesn’t give access to the others.
4. **Fail closed, lock first.** Any uncertainty results in account suspension and human escalation, not continued processing.
5. **Policy as data, not code.** Entitlement scopes, verification requirements, and approval phases are declarative configuration objects, not conditions embedded in LLM prompts that could be manipulated.

We’re going to relearn this the hard way over the next few years, one incident at a time. The temptation will always be to give the agent a little more reach, because it’s so capable and it makes everything so much smoother. 

Every single time it will feel reasonable, and every so often someone will talk one of these agents into doing something it should never have been able to do in the first place.

The agents that survive contact with real attackers won’t necessarily be the smartest ones. They’ll be the ones that were never handed the keys.