Skip to content

How I Back Up an OpenClaw Assistant Without Leaking Its Private State

A practical OpenClaw backup pattern that separates encrypted disaster recovery from curated Git history.

Persistent AI assistants accumulate more state than people expect.

At first, it feels like you are just chatting with a model. Then you add tools. Then workspaces. Then memory files. Then browser state, cron jobs, helper scripts, channel configuration, project notes, agent-specific instructions, and a pile of small decisions that only exist because the assistant was there when the work happened.

Eventually the assistant is not just a chat interface anymore. It is an operating environment.

That means it needs backups.

But the obvious answer, “just put the workspace in Git,” is wrong. Or at least incomplete.

An OpenClaw assistant has two very different backup needs:

  1. A complete disaster recovery backup that can restore the environment after something breaks.
  2. A curated Git backup that preserves important text history without leaking private state.

Those sound similar until you try to design the actual system. Then they become almost opposites.

The disaster recovery backup should be complete. The Git backup should be selective.

The DR backup should include private operational state, but only after encryption. The Git backup should be readable and reviewable, but intentionally incomplete.

The biggest lesson from setting this up was simple:

Do not confuse restorable with reviewable.

The problem with persistent agents

An OpenClaw environment can accumulate a surprising amount of state:

Some of that state is safe and useful to keep in Git. A lot of it is not.

A Git repository is great for durable text artifacts: instructions, templates, notes, scripts, and sanitized memory. It is terrible for raw private documents, OAuth files, browser state, screenshots, PDFs, receipts, generated caches, and anything that might contain secrets.

An encrypted disaster recovery backup has the opposite shape. It should include enough to restore the system after a host or container failure. That means it may need to contain private state. But it should be encrypted before leaving the machine.

So I ended up with two backup layers.

Layer 1: encrypted disaster recovery backup

The first layer is for full restore.

The goal is boring and operational: if the host dies, the container gets corrupted, or I accidentally damage the workspace, I want a path back.

The generic flow looks like this:

OpenClaw state/workspaces
  -> OpenClaw native backup archive
  -> local verified tarball
  -> Restic encrypted backup
  -> rclone remote storage
  -> retention/prune
  -> periodic integrity check

The important detail is that the OpenClaw backup archive is created first, then that archive is encrypted and backed up with Restic.

The base command is OpenClaw’s native backup command:

openclaw backup create --verify --output <archive-dir>

Before trusting it, I used the dry run output to see what would actually be included:

openclaw backup create --dry-run --json

That matters because agent workspaces may live outside the main workspace tree. In my setup, specialized agents have top-level workspaces rather than being nested under the main assistant workspace.

The generic shape is:

/data/.openclaw
/data/workspace
/data/workspace-<agent-a>
/data/workspace-<agent-b>

If DR does not include those top-level agent workspaces, then the backup is not really complete.

After the OpenClaw archive is created and verified, Restic handles encryption, remote storage, retention, and integrity checks.

A simplified version of the key commands looks like this:

openclaw backup create --verify --output "$ARCHIVE_DIR"
restic backup "$archive_path" --tag openclaw-state --tag openclaw-backup-archive
restic forget --tag openclaw-state \
  --keep-daily "$KEEP_DAILY" \
  --keep-weekly "$KEEP_WEEKLY" \
  --keep-monthly "$KEEP_MONTHLY" \
  --prune
restic snapshots --tag openclaw-state --latest 3

The check job is intentionally separate:

restic snapshots --tag openclaw-state --latest 5
restic check

Backups that never get checked are mostly hope with timestamps.

Container-side vs host-side backup tools

There are two reasonable places to run the backup tooling.

One option is container-side: install or provide restic and rclone inside the OpenClaw environment and run the whole job from there.

That is simple because OpenClaw cron can schedule the script directly. Everything happens in one place.

The other option is host-side: let the host run restic and rclone, while the container only creates the OpenClaw backup archive.

The generic host-side flow looks like this:

host script
  -> docker exec <openclaw-container> openclaw backup create --verify
  -> host sees archive through bind-mounted /data
  -> host restic backs up archive to remote
  -> host restic applies retention/prune

The host-side version avoids modifying the OpenClaw container image just to add backup utilities. It also makes the backup toolchain less dependent on future container image changes.

The tradeoff is that the host script needs access to Docker and the persistent data mount.

Neither version is universally better. The key is to make the boundary explicit.

Layer 2: curated Git backup

The second layer is not for full restore.

It is for readable history.

The Git backup is useful because I want diffs for important text files. I want to recover a note, inspect how instructions changed, or review memory edits over time.

But I do not want Git to become a public or semi-public copy of the assistant’s whole life.

So the Git backup is allowlisted.

It includes things like:

AGENTS.md
TOOLS.md
HEARTBEAT.md
IDENTITY.md
SOUL.md
USER.md
MEMORY.md
BROWSER_AUTOMATION.md
scripts/
skills/
selected memory files
selected templates

It excludes things like:

.env files
OAuth tokens
browser state
raw private documents
resumes
PDFs
DOCX/XLSX/ZIP files
screenshots/images/receipts
generated caches
raw inbox imports
extracted files
working directories

The rule is: Git gets durable, sanitized text artifacts. It does not get raw private state.

Specialized agent workspaces make this more interesting. If agents live at top-level paths like:

/data/workspace-brand
/data/workspace-career

but the Git backup repo is rooted at:

/data/workspace

then adding the whole external workspace directly is the wrong move.

Instead, the backup script copies curated excerpts into a backup area:

/data/workspace/git-backup/<agentId>/

Only selected files are included:

AGENTS.md
SOUL.md
USER.md
MEMORY.md
HEARTBEAT.md
IDENTITY.md
TOOLS.md
memory/*.md

The messy stuff stays out:

working/
inbox/
extracted docs
resumes
PDFs
screenshots
reports
generated caches
private imports

That gives me useful history without pretending every workspace belongs in Git.

The bug that made the design better

The first version of the Git backup script used rsync.

Then it failed because rsync was not installed.

That was annoying, but it improved the design. I replaced it with plain find and cp, which made the script less dependent on extra packages.

Backup scripts should be boring. The fewer assumptions they make about the environment, the better.

The safety incident that mattered more

There was also a more important failure mode: an overly broad backup commit briefly included more than intended.

The fix was to reset back to the last known safe commit, restore only the needed working files, replace the commit with a sanitized version, and force-update the remote with --force-with-lease.

The lesson was not “be more careful next time.” That is too vague to be useful.

The lesson was:

A few useful forbidden-path checks include patterns like:

git-backup/.*/working
git-backup/.*/inbox
*.pdf
*.docx
*.xlsx
*.zip
*.png
*.jpg

The exact patterns depend on the workspace. The principle does not.

Scheduling and alerts

I schedule these as separate jobs because they have separate jobs to do.

The DR backup runs daily.

The Restic integrity check runs weekly.

The curated Git backup runs more frequently, because it is cheap and mostly text.

Each job runs in an isolated context and reports failures to an ops/failure channel.

That last part matters. Silent backup failure is one of the worst kinds of failure because everything looks fine until the day you need the backup.

What each layer protects against

The encrypted DR backup protects against:

The curated Git backup protects against:

Those are different jobs.

Trying to make Git do both is how private state leaks.

Trying to make encrypted backup do both is how useful history becomes opaque.

What this still does not solve

This setup is better, but it is not magic.

It does not replace restore drills.

It does not automatically scrub old Git history if a bad commit was pushed.

It does not remove the need to rotate leaked tokens if secrets ever hit chat, logs, or Git.

It does not decide what is private forever. The curation rules need to evolve as new projects and agents are added.

It also does not guarantee future container images include every dependency. That is one reason I like keeping the backup toolchain as simple and explicit as possible.

The takeaway

For persistent AI agents, backup design is part of the trust model.

Use encrypted disaster recovery backup for completeness.

Use curated Git backup for reviewability.

Do not Git-push your assistant’s whole life just because Git is convenient.

And do not assume an encrypted archive gives you the readable history that Git is good at.

Restorable and reviewable are both important. They just should not be the same backup.

OpenClaw AI agents


Next
Giving an OpenClaw Agent GitHub Access Without Giving It the Keys

Related Posts