Friday, March 21, 2003

I suppose it's better than rounding down.


I have a question for you smart people.



I've written a clever little script that takes an ordered inclusion/exclusion list of paths for backup, then figures out how many cd's it would take to back the whole thing up, partitions the data, and burns that many cd's. It puts an index file on every disk, so you can always tell which disk to go to for any given file.
Later I'll write a script that can read the index file and restore a subtree, prompting for each relevant disk as needed.


It's very straightforward, but there's one problem. I'm currently accumulating file sizes in bytes to figure out how to partition files onto cd's, but I know some sort of disk block rounding will occur. (Every file, no matter how small, will take 1k or 4k or something). Plus I'm sure there's some amount of ISO 9660 filesystem overhead which I should account for too. If I don't account for these things, I'll end up trying to fit more on each cd than can actually fit.



So my question is this: Is there any principled way for me to account for block size rounding and filesystem overhead when I'm working out how many files I can cram on a CD? Or should I just give up and just leave a 10% buffer for "overhead"?



5 comments:

  1. Incidentally, I already provide a user-tunable "blocksize" parameter, and I round up file sizes to that. And I take into account the size of the directories themselves (not just their contents), but that's their size on the host filesystem (ext3 in this case), not necessarily on ISO 9660.
    I should probably read the ISO 9660 spec, but I don't wanna.

    ReplyDelete
  2. You could also do several test backups and measure the resulting disk usage. Or, perhaps you could create disk images instead of directly burning to CD.

    ReplyDelete
  3. > Or, perhaps you could create disk images instead of directly burning to CD.
    I'm essentially doing that already, using mkisofs to make a disk image which I then burn with cdrecord. It doesn't help. I can't know how many files I can put a disk image before the image gets too big.
    While I could create a tentative disk image every time I add a file, to see if it's too big yet, that seems really inefficient.
    Unless there's a way to build the image incrementally. I didn't think of that. Hmmm.

    ReplyDelete
  4. Is there a way to mount a disk image read/write? I've never tried that under Linux so I don't know.
    If so, though, you'd be in luck.
    Or if you're, like, really dedicated to this project you could set aside a partition of the right size on your hard drive, format it as that filesystem, write until you have to stop, then use dd to grab it for later writing to CD.
    That's not exactly convenient, though. Nor portable to other people's machines.

    ReplyDelete
  5. hi
    this is exactly my question. for each format there must be a formula. there are several options each of which may affect of the size really occupied on a CD. Mode, format (enabling Joliet), volume label, number of files, names of files and directories, number of directories and so on. i've written a Perl program to create a .bat file to move my files and folders to new folders and preparing them to be burned on CD's. everything work fine unless i have the overhead problem. let me know how can i calculate it. i don't want to lose even 1 byte of a CD! format of an image file may help. let me know how can i figure it out. thanx
    please post ur reply by email, because this is the 1st and probably the last time i check this page
    thanx

    ReplyDelete