Research·Skill security

How to audit a Claude skill before you run it.

A skill is not a plugin you install and forget. It is instructions your agent will follow. Auditing one means reading it the way an attacker would, before it ever enters your environment.

The short version

  • A skill is executable influence: SKILL.md instructions, bundled scripts and referenced files that your agent will read and act on.
  • The ClawHavoc campaign put 335 malicious skills on a public registry, and most published skills carry no verifiable provenance at all.
  • Auditing by hand means reading SKILL.md as instructions, checking every script and URL, matching permissions to the job, and pinning the content by hash.
  • Skills are text, so injection patterns, exfiltration endpoints and permission over-reach can be checked deterministically in seconds, before first use.

A Claude skill is a set of instructions an AI agent will follow. That single fact changes how you should treat it. You are not reviewing a library your code decides when to call. You are reviewing a document your agent reads into its own context and then acts on, often without asking you again. Auditing a skill means reading it the way an attacker would read it before shipping it: what can I make this agent do, and who would notice.

Most people do not read them at all. A recurring sentiment on Reddit puts it plainly: nobody checks what is inside Claude Code skills. They get copied from a registry, dropped into a project, and trusted because they work.

Why do Claude skills need a security review?

A skill is three things bundled together, and all three are inputs your agent trusts. There is the SKILL.md file with its instructions and description. There are the bundled scripts a skill can carry and run. And there are the referenced files and URLs the skill tells the agent to fetch. Each one is a place to hide behavior you did not sign up for.

This is not hypothetical. The ClawHavoc campaign placed 335 malicious skills on a public skills registry, published to look ordinary and wait for adoption. And that is the visible end of the problem. Most published skills carry no verifiable provenance at all: no reliable signal about who wrote them or whether the content matches what a human once reviewed. That gap is the wider story we covered in the agent supply chain.

What does a malicious skill actually do?

The techniques are not exotic. They are old attacks moved one layer up, into the text the agent obeys.

Injection hidden in the instructions. A SKILL.md file or even its short description can carry directives that steer the agent: ignore prior constraints, run this first, send output there. The description alone is enough, because the agent reads it to decide when the skill applies.

Exfiltration endpoints. A script or a fetch step that quietly ships environment variables, tokens or file contents to an address that has nothing to do with the skill's stated job.

Over-broad permission requests. A skill for formatting text that also wants shell access and network reach. The extra scope is where the damage lives.

Silent updates after approval. The skill you reviewed and the skill running next week are not guaranteed to be the same bytes. This is tool poisoning one layer up: the content that shapes agent behavior mutates after you stopped looking.

How do you audit a skill by hand?

You do not need a lab. You need to slow down and read the artifact as instructions, not as documentation. Five steps cover most of it.

  1. Read SKILL.md as instructions, not docs. Ask what the agent would actually do if it followed every line literally, including the description. Look for anything that redirects behavior, overrides constraints, or names an external destination.
  2. Check every bundled script. Open each one. A skill that carries code can run that code. Read what it touches: files, environment, network.
  3. Check the referenced and fetched URLs. Any address the skill tells the agent to load is an input you are trusting. Confirm each one is what it claims and returns what you expect.
  4. Match requested permissions to the job. Write down the smallest set of capabilities the stated task needs. Anything the skill asks for beyond that list is a finding, not a convenience.
  5. Pin the content by hash. Record a hash of what you approved. When the skill changes, the hash changes, and that re-triggers review instead of trusting a version you never saw.
You are not deciding whether a skill is useful. You are deciding whether you would let its author type those exact instructions into your agent, because that is what running it does.

Can this scale?

By hand, one skill at a time, no. But skills are text, and text is checkable. Deterministic checks for known injection patterns, for exfiltration URLs, and for permission over-reach run in seconds, before the skill ever reaches a model's context. The manual read stays valuable for judgment calls. The mechanical checks catch the obvious poison at the scale a registry actually moves.

That is the point of scanning before first use. The cost of reading a skill mechanically is trivial next to the cost of an agent acting on one you never opened.

skill audit · pre-runpinned
1SKILL.md     instructions parsed · hash 9f2c… ok
2description redirect directive found   review
3scripts/    outbound POST to unknown host → blocked
4permissions shell + network vs task: format over-reach
5evidence → reported before first use
A skill read as instructions and capabilities before it runs, not as a name you trusted.

Where does this fit with MCP?

Right now, installing a skill is a deliberate act. You go get it. SEP-2640 proposes shipping skills over MCP as Resources, which changes that. Skill installation stops being a choice you make and becomes a side effect of connecting a server. The review window you have today, the moment before you copy a skill in, quietly disappears.

That is why the audit has to move earlier and become automatic. When the act of installing goes away, the check cannot depend on a human choosing to run it. This is the same detection versus authorization line we drew in an earlier piece: knowing a skill is risky matters only if something can stop it before the agent acts.

The takeaway

Treat every skill as instructions an author gets to give your agent. Read SKILL.md, its scripts and its URLs before you run it. Match permissions to the job. Pin what you approved so a later update re-triggers review. And where a human read cannot scale, let deterministic checks catch the obvious poison first.


Oktsec monitors 58,000+ published skills and servers and runs deterministic checks for injection, exfiltration and permission over-reach. They read the skill by eye. We read it at the scale a registry moves. See Signal →