Very slow spreadsheet - Way to improve

Discuss the spreadsheet application
Post Reply
Stockman1
Posts: 33
Joined: Tue Jul 08, 2014 1:08 pm

Very slow spreadsheet - Way to improve

Post by Stockman1 »

I have a spreadsheet which pretty much puts my pc out of action whilst it is running. I have to go for a coffee whilst it does its thing.

Is there any way I can speed this up?

At the moment the ods file does the job perfectly but that is on 102,000 items.

We will eventually have 500,000+ items and I'm worried that the the vlookups in the spreadsheet will slow the process exponentially.

I enter data into A155:B102,376 manually from a csv provided by a third party.

Col F is the items I am specifically interested in at that time with Col G doing a lookup of Col F against Cols A and B.

Cols I and J are the output I need to put into another csv file which gets uploaded to a website.

A1:D154 are some lookups for where the entry in column F would not match anything in A155:A102376.

The problem is that I now have vlookup of vlookup of vlookup for some items.

I can see this growing out of control. What can be done please?

Having to dropbox this file as I could not cut it down to 128KB and still get all the machinations in.

https://www.dropbox.com/s/jncnz44798ndj ... 1.ods?dl=0

Thanks for any useful suggestions.
Open Office 4.1 / Windows Vista
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Very slow spreadsheet - Way to improve

Post by John_Ha »

Stockman1 wrote:I have a spreadsheet which pretty much puts my pc out of action whilst it is running. I have to go for a coffee whilst it does its thing.

Is there any way I can speed this up?

At the moment the ods file does the job perfectly but that is on 102,000 items.

We will eventually have 500,000+ items and I'm worried that the the vlookups in the spreadsheet will slow the process exponentially.
You cannot "tune" Calc to make it go faster.

You have an enormous spreadsheet. Calc is not really designed to handle such large spreadsheets so what you are seeing is to be expected. Calc is designed "to be easy to use" and not designed "to be efficient at calculating".

The only way you can improve things is if you can re-design the calculations so as to make them more efficient. Can you split the data and do each bit separately and then merge the results? Make sure you use the database functions wherever possible as they are designed to be efficient - see the Calc manual for more information.

See the thread OpenOffice Calc Crashes While Computing Large Amount of Data for more discussion. See Force Calc to Use 100% CPU.
Clipboard01.png
If this solves the problem, please view your first post in this thread and click the Edit button (top right in the post) and add [Solved] in front of the subject.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Stockman1
Posts: 33
Joined: Tue Jul 08, 2014 1:08 pm

Re: Very slow spreadsheet - Way to improve

Post by Stockman1 »

Thank John
Yes I 'd thought of splitting it up previously and I may have to do that. One way I thought of was to leave a macro running and 'drip-feeding' the data in and then drip-feeding the data out and just continuously update data that way; a bit at a time. A bit of a faff but I think there is an old pc here somewhere that I could leave just to do that job and nothing else. It would be nice to see a more elegant way if it exists though.
Open Office 4.1 / Windows Vista
User avatar
keme
Volunteer
Posts: 3699
Joined: Wed Nov 28, 2007 10:27 am
Location: Egersund, Norway

Re: Very slow spreadsheet - Way to improve

Post by keme »

If it is practical to sort the table in columns A-B and you don't need the VLOOKUP function to ensure that the lookup value matches exactly (i.e. you know that the lookup value is there, or you check it by other means), you can use the sorted lookup, which is many times faster.

I tried with your sample file. Loading took about 7 seconds. Then recalculation took about 70 seconds. Saved and repeated. Was just as slow the second time around.
Sorted the table as suggested. This took 45 seconds. Removed the "unsorted" specifier in the VLOOKUP() function calls. Saved with a new name.
Shut down Calc. Started again and reloaded the modified file. Again, load took about 7 seconds. Recalc was done in virtually no time (less than half a second).

I did not do any sanity/integrity check on the results, because I don't know the nature of your data so I don't know which values would be sane or not. That is your job, if this is at all a viable solution to the issue. Obviously there may be situations when sorting is not an option. In such cases, connecting the data source to a database server and indexing it may help you. This requires quite a bit of initial effort, though.

Background:
On unsorted tables the most efficient search is sequential, which averages n/2 lookups. With sorted tables you can use binary search, averaging log2(n) lookups. The advantage of sorting increases rapidly as table grows.
With 10 entries in a table, the speed ratio between the two searches is 3:5 in favor of binary search.
With 1000 entries in a table, the speed ratio between the two searches is 1:50 in favor of binary search.
With 100000 entries, binary search is about 3000 times faster.
Apache OO 4.1.12 and LibreOffice 7.5, mostly on Ms Windows 10
Stockman1
Posts: 33
Joined: Tue Jul 08, 2014 1:08 pm

Re: Very slow spreadsheet - Way to improve

Post by Stockman1 »

Thanks Keme. Nice suggestion. I hadn't though of it and yes, it would be easy to get the tables in order before putting them into the spreadsheet. I'm going to give it a try within the next 20 mins and will post an update.
Open Office 4.1 / Windows Vista
Stockman1
Posts: 33
Joined: Tue Jul 08, 2014 1:08 pm

Re: Very slow spreadsheet - Way to improve

Post by Stockman1 »

On trying the above for myself your advice makes sense Keme and that would be a way improving things.

Unfortunately the entries and numbers of entries in columns A, B and F and G change regularly so I can't do a straightforward F23=A99 and G23=B99 for example. I am stuck with vlookups (some nested) and I guess that is what is slowing things down greatly.

Even though the process is lengthy I am solaced when I consider how long it would take to try and do it by hand.

Fortunately I happen to like coffee too.
Open Office 4.1 / Windows Vista
Post Reply