Page 1 of 1

Formulas to group files by size to fit CD burn

PostPosted: Sun Dec 10, 2017 12:27 am
by Ironmanxp
Hi everyone smarter than me! I often have several files of different sizes that I want to burn to CD for storage. The issue I always run into is the task of grouping files together so that they come out to just below the volume size of my burn target so I don't waste a lot of storage space. This sometimes include rather large files up to 2Gb. So sometimes one file can take up the space of dozens of others.

What I would like is a formula that would sort a field of files where a file # (1-~100, etc) is in column A, the file name is in column B and the file Size is in column C. Then return a grouping of which files can be best combined to fit my target size of 4.3Gb. For example in Columns E-Z (titled for disc numbers) it would return a number X (1-thru next) for files to fill Disc 1 and so on. So as example: in Column E there would be a 1 in the cell on the line of each file that should be grouped on the first Disc 1, say files 1, 5, 12, 40, 46, etc whose combined size would be less than 4.3 (or within a range of 4 to 4.3GB); followed by Column F where a 2 would appear in the cell on the line for all files to be grouped on Disc 2 and so on until all files are screened with perhaps the final Column ~J would return a 6 (or even a Zero) in the cell of the remaining files. At that point depending on how full the final disc would be, I could perhaps just wait until I have more fills to back up. I could then easily sort the fields by Disc # for the group of files to be included on each Disc.

I hope that makes sense. I seem to think that someone really familiar with OOCalc formulas would know how to do this with just a few formulas, but alas I am not yet one of those people. For anyone out there that would like to take this on, I would be greatly appreciative.

Thanks.

Re: Formulas to group files by size to fit CD burn

PostPosted: Sun Dec 10, 2017 12:46 am
by MrProgrammer
Ironmanxp wrote:Then return a grouping of which files can be best combined to fit my target size of 4.3Gb.
Calc is not designed to solve the difficult Subset sum problem. However, it might not be hard to create a spreadsheet which could select files consecutively until the sum exceeds a target and then moves to the next CD. You should attach a document demonstrating the situation (remove confidential information then use Post Reply, not Quick Reply, and don't attach a picture instead of the document itself) because it is less likely that someone will create sample data in addition to providing a solution.

Re: Formulas to group files by size to fit CD burn

PostPosted: Sun Dec 10, 2017 12:50 am
by Ironmanxp
Thanks. Hadn't thought of the subset sum problem. I need to rethink this a bit. I appreciate the feedback.

Re: Formulas to group files by size to fit CD burn

PostPosted: Sun Dec 10, 2017 2:06 am
by RusselB
Another option, is to make a progressive back-up using a program like Winzip or WinRar.
I don't know for sure about WinZip, but WinRar has this ability during the trial period (unless they've changed that) and as a registered product (which is what I use).
It's not exactly the same as what you're asking for, as the process I'm talking about will create several .r?? files, with each one being the the target size and all of them together being the equivalent of a single compressed file that contains the file(s) that you are wanting to put onto disc.

Re: Formulas to group files by size to fit CD burn

PostPosted: Mon Dec 11, 2017 1:35 am
by Lupp
This looked like a challenging question, and the hint to the subset sum problem could make me shudder. Thus I thought a bit about the actual problem, in specific the probably relevant parameters, and a pragmatic strategy for which to apply I also made a crude spreadsheet.

What's most relevant?
First of all I exclude files larger than one medium's capacity and also media of different size. If we can write backup files spanning media, the problem original vanishes (and others come in sight). If there are media of different sizes, the task is gravely under-determined so far.

If the files are (to the contrary of what the OQ told) of nearly same size things are simple:
Size > 1/2 CD ===>> As many CD needed as files are to backup.
Sizes just around 1/2 CD ===>> Try to find as many pairs as possible. See general strategy below.
Size ~~ 1/n CD ===>> NumberOfCDs ~~ NumberOfFiles / n.

If (as the OQ told) the sizes are very different, there can still be many of similar size > 1/2 CD. ===> Not much optimization possible. Strategy: Start assigning the big files one by one and fill up.

More general strategy:
If the large sizes are not too dominant in number and there are lots of files of a size allowing for many files per CD a simple strategy gives good results:
-1- Start with the largest file on the first CD.
-2- Add the largest one of the remaining files that still has room enogh on the same disk.
-3- Continue this way as long as possible.
-4- With the remaining files start like -1- with the next CD.

I did not try a proof. Nonetheless the strategy looks rather optimal. My experiments using the mentioned spreadsheet didn't tell me (clearly) otherwise.
I would be glad to get a refutation based on an example or clear evidence.

Re: Formulas to group files by size to fit CD burn

PostPosted: Mon Dec 11, 2017 1:58 am
by Ironmanxp
Thanks for the thought process. In a way that is what I've been doing manually. In that I take the most of the larger files first and then piece in the remaining space with smaller ones until I get reasonably close to the full volume of the disc which is always 4.37 Gb. I will give this more thought and perhaps see if I can just convert those activities into formulas that will do the "busy" work for me in a spreadsheet. Thanks again. It's always great to look at a problem through someone else's eyes.

Re: Formulas to group files by size to fit CD burn

PostPosted: Mon Dec 11, 2017 2:26 am
by Lupp
I don't backup on CDs, but am interested in the problem. You can get my sheet(s), of course. As already told they are very crude, however. At least a bit of polishing needed.
BTW: Are there more than 1000 files in one backup cycle?

Re: Formulas to group files by size to fit CD burn

PostPosted: Mon Dec 11, 2017 2:45 am
by Ironmanxp
No, usually less than 200 at a time.

Re: Formulas to group files by size to fit CD burn

PostPosted: Tue Dec 12, 2017 1:16 am
by Lupp
I made an attempt to get a version of the sheet I mentioned into a form hopefully understandable for someone else but the author.
It is prepared for a maximum of 300 items.
See attachment, and don't miss to read the remarks.

Re: Formulas to group files by size to fit CD burn

PostPosted: Tue Dec 12, 2017 1:20 am
by Ironmanxp
Hey thanks a lot for all the effort. I have the file, but I'm out most of the week. Perhaps I can look at it when I get back. Thanks again, Happy Holidays.