safe-agent-run-protocol
Protocol
Execute this playbook before running high-risk prompts against tool-enabled agents.
context
- Use case: controlled adversarial testing.
- Preconditions: isolated environment and synthetic datasets.
- Out-of-scope: direct production execution.
hypothesis
- Strict pre-dispatch validation reduces unsafe tool calls.
- Weak audit trails increase incident response time.
setup
- Required access: local lab runtime.
- Required tooling: request logger, policy checker.
- Baseline controls: tool allowlist + argument schema checks.
steps
- Confirm target system is non-production.
- Enable full request/response logging.
- Run prompt through policy linter.
- Execute with tool dispatch gate enabled.
- Review logs for drift and containment events.
observations
- Most failures occur at argument composition phase.
- Human-in-the-loop approval catches edge cases.
results
- Safer runs with minor latency increase.
- Improved reproducibility of incident traces.
indicators
- Rejected tool calls per session.
- Policy linter violation count.
mitigation
- Block execution on unresolved policy violations.
- Rotate sandbox fixtures when contamination suspected.
validation
- Weekly replay of known adversarial prompts.
- Compare validator decisions against baseline.
follow-ups
- Add automated diff for policy changes.
- Link controls to changelog statistics.
references
Containment
Do not run this playbook against systems with live customer data.
Breach
If unvetted tool execution occurs, terminate session and rotate all session tokens.
publish safety
- No secrets or credentials present.
- Internal endpoints omitted.
- No private repository URLs.
- Procedure is lab-safe for public sharing.
Signed, Aleksandr Krasnobai // inside-the-loop