12

I have a question that's probably not asked too often, because it has little negative impact on the user. I can't find it on The Google anyway.

So this question is for the edification of the curious-minded.

In my production environment I manually backup often. I have noticed that when simply duplicating files, they don't take up as much space as they used to in older versions of MacOSX.

As an experiment (because I am a nerd) I attempted to fill my hard drive by copying large files. Duplicating files barely changed the "capacity" / "available" numbers at all.

The duplicates are there.

Can someone explain this aspect of the OSX filing system to me? It's absolutely amazing!

Details:

  • not using Cloud
  • MacBook Air ca. 2017
  • MacOSX 10.14.6
Parapluie
  • 393

1 Answers1

22

Apple File System (APFS) and Clones

Copying a file creates a clone of that file. Clones take very little space. Only when a clone is modified, is significant additional space required.

Clones

Clones allow the operating system to make efficient file copies on the same volume without occupying additional storage space. Changes to a cloned file are saved as delta extents, reducing storage space required for document revisions and copies.[9] There is, however, no interface to mark two copies of the same file as clones of the other, or for other types of data deduplication.

See also iMore's Apple File System (APFS): What you need to know.

To understand how APFS works, see the Apple File System Reference.

Graham Miln
  • 43,776
  • 1
    Very nice! I suspected that some kind of system was in play like the one in the "Delta extents" link. Thank you, Graham, for providing this information on Clones. – Parapluie Mar 02 '20 at 19:17
  • 1
    I wonder how it works when you clone, then delete the original. Or, worse still, clone - then original gets corrupted/damaged and you try to copy your clone back. I can see non-technical users getting rather annoyed at this. – Aleks G Mar 03 '20 at 14:48
  • 2
    @AleksG This is exactly my concern with this hidden behavior. Duplicating files on the same file system is common for temporary backups. And how does one know the files are linked if you duplicate and rename? And how many deltas are needed before a whole new file is just made? – Logarr Mar 03 '20 at 15:43
  • With APFS, we need to change our behaviour and start copying to alternative volumes or physical drives. Copies on the same volume should not be considered safe guards against file corruption. – Graham Miln Mar 03 '20 at 16:30
  • 2
    @Logarr there is no Apple provided friendly interface to examine the underlying state of files on APFS. I would not expect any. The trend with Apple is towards more abstraction in the file system. Look at Apple's decision with how the Applications folder is handled in macOS 10.15. – Graham Miln Mar 03 '20 at 16:33
  • 4
    @AleksG "I wonder how it works when you clone, then delete the original." Nothing interesting happens. The old reference to the data is destroyed, the new reference remains, and the underlying data is undisturbed. – Alexander Mar 03 '20 at 17:24
  • 1
    @AleksG thats already a thing on Unix, it's called a link. if you make a file called "foo", that is a filename entry in a directory pointing to a sequence of blocks on the disk. If you go ln foo bar now "bar" is a filename entry in a directory pointing to the same sequence of blocks. If an app opens "foo", that creates a third pointer, so if you delete both "foo" and bar" the sequence of blocks persists until the app is done. Unix and MacOS already do this, classic example an iPhoto to Photos migration, new Photos directory is just plain links to files in the old iPhoto directory. – Harper - Reinstate Monica Mar 03 '20 at 18:30
  • 1
    @Logarr "And how does one know the files are linked if you duplicate and rename?" You shouldn't care about it; it's implementation detail. Files should behave exactly the same except for taking less spaces. – Franklin Yu Mar 03 '20 at 18:42
  • @Harper-ReinstateMonica IMO link is different. Its behavior is explicit and specified, not like "clones" which is more like an optimization (think about gcc -O3) that shouldn't change the behavior. If you explicitly link two handles to one file (inode), then changing the file through any handle would update what other handles see. This doesn't seems to be the case for clones, which works more like Copy-on-Write. – Franklin Yu Mar 03 '20 at 18:45
  • 2
    @Harper I'm familiar with links. I am also painfully aware of what happens when the link target gets corrupted. A typical macos user may not realise that they are not creating a full backup copy when in fact they are creating a link. – Aleks G Mar 03 '20 at 18:49
  • @AleksG the apple clone isn't a link, but I suspect uses the link infrastructure. If A links to A' on creation, and clone B links to A' x diffs... If A is modified/corrupted, then A now links to A' x diffs. A' can't be changed because B needs it. So the question is whether A is corrupted or whether A' is corrupted, i.e. At what level of the filesystem the corruption occurs. – Harper - Reinstate Monica Mar 03 '20 at 18:57
  • 1
    To understand how APFS works, see the Apple File System Reference. – Graham Miln Mar 04 '20 at 05:06
  • @GrahamMiln can you clarify what this means? "no interface to mark two copies of the same file as clones of the other, or for other types of data deduplication."

    I'm not sure how that differs from the use you described prior.

    – thesowismine Jan 29 '22 at 23:35
  • @thesowismine the quote is from Wikipedia. Consider asking on the Wikipedia Talk page for clarification. – Graham Miln Jan 30 '22 at 09:19