Skip to content

feat(mobilebackup2): Transactional backups#101

Draft
kjzl wants to merge 2 commits into
jkcoxson:masterfrom
kjzl:feature/transactional-backup
Draft

feat(mobilebackup2): Transactional backups#101
kjzl wants to merge 2 commits into
jkcoxson:masterfrom
kjzl:feature/transactional-backup

Conversation

@kjzl
Copy link
Copy Markdown

@kjzl kjzl commented May 10, 2026

WIP: Keep a journal for rollback of backup on-disk state when the backup is aborted/interrupted before finishing.
Introducing some breaking changes (existing code is migrated, we might still want to keep unjournaled backups around?) -> Replacing BackupDelegate with a new API BackupStore which natively supports transactional-style operations.

@jkcoxson
Copy link
Copy Markdown
Owner

Hey thanks for the draft, excited to see where this goes. Is it not possible to implement your journal system with the current trait system?

Do you mind dropping the commit for CarPlay?

@kjzl kjzl force-pushed the feature/transactional-backup branch from 8ef8808 to 762da2a Compare May 10, 2026 14:32
@kjzl
Copy link
Copy Markdown
Author

kjzl commented May 10, 2026

There are some options for the changes needed to support clean transactional backups, this is your call. The new trait is not radically different from the old one. It mostly changes the semantics around path handling and file replacement.

The current public trait is approximately:

pub trait BackupDelegate: Send + Sync {
    fn get_free_disk_space(&self, path: &Path) -> u64;

    fn open_file_read(&self, path: &Path) -> Future<Result<Box<dyn Read + Send>>>;
    fn create_file_write(&self, path: &Path) -> Future<Result<Box<dyn Write + Send>>>;

    fn create_dir_all(&self, path: &Path) -> Future<Result<()>>;
    fn remove(&self, path: &Path) -> Future<Result<()>>;
    fn rename(&self, from: &Path, to: &Path) -> Future<Result<()>>;
    fn copy(&self, src: &Path, dst: &Path) -> Future<Result<()>>;

    fn exists(&self, path: &Path) -> Future<bool>;
    fn is_dir(&self, path: &Path) -> Future<bool>;
    fn list_dir(&self, path: &Path) -> Future<Result<Vec<DirEntryInfo>>>;

    fn on_file_received(&self, path: &str, file_count: u32) {}
    fn on_progress(&self, bytes_done: u64, bytes_total: u64, overall_progress: f64) {}
}

My proposed internal/new trait is approximately:

#[async_trait]
pub trait BackupStore: Send {
    fn get_free_disk_space(&self) -> u64;

    async fn open_file_read(
        &self,
        path: &BackupPath,
    ) -> Result<Box<dyn Read + Send>, IdeviceError>;

    async fn begin_replace(
        &mut self,
        path: &BackupPath,
    ) -> Result<Box<dyn BackupFileReplacement + Send>, IdeviceError>;

    async fn create_dir_all(&mut self, path: &BackupPath) -> Result<(), IdeviceError>;
    async fn remove(&mut self, path: &BackupPath) -> Result<(), IdeviceError>;
    async fn rename(&mut self, from: &BackupPath, to: &BackupPath) -> Result<(), IdeviceError>;
    async fn copy(&mut self, src: &BackupPath, dst: &BackupPath) -> Result<(), IdeviceError>;

    async fn exists(&self, path: &BackupPath) -> bool;
    async fn is_dir(&self, path: &BackupPath) -> bool;
    async fn list_dir(&self, path: &BackupPath) -> Result<Vec<DirEntryInfo>, IdeviceError>;

    fn on_file_received(&self, path: &BackupPath, file_count: u32) {}
    fn on_progress(&self, bytes_done: u64, bytes_total: u64, overall_progress: f64) {}
}

pub trait BackupFileReplacement: Write + Send {
    async fn finish(self: Box<Self>) -> Result<(), IdeviceError>;
    async fn abort(self: Box<Self>) -> Result<(), IdeviceError>;
}

Most of the API is the same. The meaningful changes are:

  1. Path becomes BackupPath.

    The old trait receives concrete host paths. BackupPath is a relative Path scoped to the Backup.

      backup_root/
        000080XX-001C19XXXXXXX/
          Manifest.plist          <- BackupPath("Manifest.plist")
          Status.plist            <- BackupPath("Status.plist")
          ab/cdef...              <- BackupPath("ab/cdef...")
    

    MobileBackup2 path fields are protocol input. Even on an authenticated connection, we should validate them before joining them onto host filesystem paths, especially because backup operations can create, replace, move, copy, and delete files. BackupPath is validated on construction.

  2. create_file_write(path) becomes begin_replace(path).

    This is the main reason for the new trait.

    With the old delegate, create_file_write(path) creates or truncates the final visible file immediately. But MobileBackup2 sends file bytes first and only sends the success/error trailer afterward. So with the old API, the host may already have removed or truncated the old file before it knows whether the replacement succeeded.

    With the new API, the store can return a staged replacement writer. The protocol loop writes bytes into that replacement, then calls finish() only if the MobileBackup2 trailer indicates success, or abort() if the transfer fails.

    Transactional storage can write to a temp/journal path first, preserve old state, and install the replacement only after success.

  3. BackupFileReplacement adds finish() and abort().

    The old Box<dyn Write + Send> has no lifecycle beyond write/flush/drop. That is not enough to represent “install this file now” versus “discard this file.”

    Relying on Drop would be wrong because it cannot return async errors and cannot distinguish success, failure, cancellation, or panic. Relying on flush() would also be wrong because flushing can happen before the protocol success trailer. Protocol lifecycle becomes explicit.

  4. Mutating methods take &mut self.

    The old delegate mostly uses &self, which is convenient for callback-style implementations and FFI.

    A journaled store naturally has mutable state: transaction id, operation ids, journal writer, staged replacements, and recovery state. That can be done behind &self with mutexes/interior mutability, but &mut self better expresses what is happening.

    This is less convenient for some callback/FFI-style implementations, but transactional implementations can be simpler and more direct.

  5. async-trait makes the signatures cleaner.

    The old trait manually returns boxed futures for every async operation. The new trait uses async methods through async-trait. Easier to read and implement.

Public facing API change options

  1. Fully replace BackupDelegate with BackupStore.

  2. Keep BackupDelegate public and make BackupStore the internal protocol-loop abstraction.

    Existing users keep compiling. The protocol code can still use the stronger abstraction internally. A compatibility adapter can map old delegates to the new internal store shape.

    Old delegates still only get old semantics.

  3. Keep both traits public.

    BackupDelegate remains the simple/compatibility API. BackupStore becomes the advanced API.
    The backup_from_path method for example would then internally wrap the BackupDelagate with a BackupStore adapter.

    This exposes two very similar traits, which may be confusing. The docs would need to be explicit about when to implement each one. Optionally we could also mark BackupDelegate as deprecated.

  4. Evolve BackupDelegate instead of adding a separate public trait.

    For example, we could add a default begin_replace() method to BackupDelegate, implemented in terms of create_file_write(). Existing implementors would keep working, while advanced implementors could override begin_replace() for staging.

    The trait gets larger, and the default implementation still has the old final-path semantics. Also, if we keep &self and raw Path, transactional implementations still need workarounds for mutable transaction state and path handling.


The strongest technical reason for the new shape is file upload correctness: MobileBackup2 gives us the final success/error status after the file stream, while create_file_write() mutates the final path before that status is known. begin_replace() plus finish() / abort() maps to the protocol more accurately and gives the journal layer a clean place to stage, install, or discard replacements.

I think keeping both traits public and marking BackupDelegate as deprecated would be reasonable?

@jkcoxson
Copy link
Copy Markdown
Owner

Ok, just wanted to check first, you are actually writing this code, correct? I understand that LLMs can be used as a tool to help and clean, but for a change this deep I'd prefer that a human is actually the one authoring and understanding this.

#[async_trait]

No thanks, I'd prefer the traits explicitly written as they are now until async traits are stable in the language itself. Under the hood, we are basically doing the same thing anyways.

Path becomes BackupPath

I don't understand what the difference is. Path already isn't validated against a real file system, it's just a chain of strings.

we should validate them before joining them onto host filesystem paths

In the current architecture, this can already done, no?

create_file_write(path) becomes begin_replace(path)

makes sense to me

Fully replace BackupDelegate with BackupStore

where backup support wasn't added that long ago, I'd say just replace it. As per the readme, we are explicitly not stable yet until 0.2.0, and the changes aren't that drastic for the end user. FFI shouldn't change much either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants