[Solved] How to reduce size (large amount similar formula)

Discuss the spreadsheet application

[Solved] How to reduce size (large amount similar formula)

Postby ToresVN » Fri Jan 08, 2010 8:41 am

I have a table with 300(col)*3000(row) ~ 900,000 cells. Most of these cells are formulas. How to reduce the size as formula are similar in rows?

I've tried using matrix but in vain due to the limitation in data range. For example, in cell D1:D3000 entering a formula (using matrix) like {=A1:A3000+1} works as entering A1+1 in D1 and copy D1 to D2, D3, ... and C3000. However, this strategy of reducing the file size fails in several situations, ex: formula =AND(A1:C1) in D1 is not good for matrix as =AND(A1:A3000:C1:C3000) is not valid.

A concrete example is uploaded. Please refer to the sheet named 'Result' and look at the formula in row 2. These formulas are not simple (single structure) but complicated (multiple structure). Furthermore, row 2 to row 50 (even more) are similar by columns. If one copy row 2 to row 3, row 4, ... etc., OO.o calc in face create a new formula according to the source reference (row 2).

The formula is very long compared to raw data. And this enlarge the size of file as more row entered. Is there any way to enhance the efficiency of functionality/filesize?

PS:
Create a new calc file, and compare the difference of using matrix (select range->enter formula->ctrl+shift+enter) and copy & paste, you'll find using matrix is much more efficient than C&P since the formula is defined only one time in former method.
Attachments
HowToReduce.ods
example
(23.91 KiB) Downloaded 43 times
Last edited by ToresVN on Tue Aug 24, 2010 10:59 pm, edited 1 time in total.
OOo 3.2.1 on Ms Windows XP
ToresVN
 
Posts: 41
Joined: Sun Nov 30, 2008 3:28 pm

Re: How to reduce size (large amount similar formula)

Postby Villeroy » Fri Jan 08, 2010 9:19 am

Obviously, the data come from some kind of database. You could save all the time consuming lookups importing the joined record sets directly from the database. The boolean evaluation (nested IF formulas) should perform very fast as you may see after replacing the lookups with constant data (copy&paste-special number+text without formulas).
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 12.04, Apache OpenOffice 4.x
User avatar
Villeroy
Volunteer
 
Posts: 17320
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: How to reduce size (large amount similar formula)

Postby ToresVN » Fri Jan 08, 2010 11:03 am

Villeroy wrote:Obviously, the data come from some kind of database. You could save all the time consuming lookups importing the joined record sets directly from the database. The boolean evaluation (nested IF formulas) should perform very fast as you may see after replacing the lookups with constant data (copy&paste-special number+text without formulas).


The data is of course from the database. Yet the database is not reachable anyway. The data in ods is imported from csv file.

The major problem is not calcuating time but file size. It is so large due to similar formula! And similar formula should be saved/defined only once as a templete but not many times as similar strings.

As mentioned in my first post, two methods are compared:

column A are value-type data,
column B is going to double column A

Generally, one may write =A1*2 into B1, =A2*2 into B2, ... etc. This is Copy & Paste method

Analysis in content.xml extracted from ods

Copy & Paste generates
<table:table-cell table:formula="of:=[.A2]*2" office:value-type="float" office:value="2">
<text:p>2</text:p>
</table:table-cell>
<table:table-cell table:formula="of:=[.A3]*2" office:value-type="float" office:value="2">
<text:p>2</text:p>
</table:table-cell>
...

Another approach is possible
Matrix method
<table:table-cell table:number-matrix-columns-spanned="1" table:number-matrix-rows-spanned="30" table:formula="of:=[.A2:.A31]*2" office:value-type="float" office:value="2">
<text:p>2</text:p>
</table:table-cell>

and the contents in the rest cells are identical (indicates "there exists a cell")
<table:table-cell office:value-type="float" office:value="2">
<text:p>2</text:p>
</table:table-cell>


Comparing these two methods, the latter one, using matrix, is obviously good for zip compression (ods is an intrinsic zip). Another test approved it: after replacing C&P formula to matrix, 10 MB -> 2 MB; this depends on the quantity of similar rows.

Matrix method is good for compression but not good in all case. Not all formula can be translated into matrix formula.


Speak in C,
if the problem is to print
*
**
***
****
*****

the c&p method is
printf("*\n");
printf("**\n");
printf("***\n");
printf("****\n");
printf("*****\n");

lines are similar but different

the matrix method is
for (int i=1;i++<5;putchar('\n'))
for (int j=0;j<i;j++)
putchar('*');

much more efficient in file size consideration!

How to do like the matrix and get rid of its limitation?
OOo 3.2.1 on Ms Windows XP
ToresVN
 
Posts: 41
Joined: Sun Nov 30, 2008 3:28 pm

Re: How to reduce size (large amount similar formula)

Postby Villeroy » Fri Jan 08, 2010 11:30 am

As far as I understand the matter, this is a draw back of the ODF format which is rather simple to implement compared to Microsoft's OOXML which requires a close-to-Excel-clone in order to process the data as intended (that's what I read elsewhere).
In other words: I have not the faintest idea how to overcome this problem other than installing Gnumeric, which performs very well with xls and its own XML spreadsheet format. http://gnumeric.org
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 12.04, Apache OpenOffice 4.x
User avatar
Villeroy
Volunteer
 
Posts: 17320
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: How to reduce size (large amount similar formula)

Postby ToresVN » Fri Jan 08, 2010 1:00 pm

Thanks for your solution for the alternative. I think I prefer to intrinsic OO.o solution! So is it oaky to request a feature for such function? First select the data range, and then assign a parameter, the reference of formula. The effect is exacly like copy & paste and save the file size and loading time!


** Below is not well correspond to this topic
* I'v not collect enough evidence to prove them yet...

Villeroy wrote:The boolean evaluation (nested IF formulas) should perform very fast...


By the way, I've encounterd strange problem while using heavy-nested IF(). How heavy is it? For exmaple, IF(IF(IF())), is possible to cause such problem. Assume 3 or more layer IF statement, the 3rd layer IF always return true statement. Yet I can't tell exactly how to reproduce it. I 'feel' this comes more easily as using style() with large table (~1,000,000 cell) filled with long formulas.

Conditional Formatting is not preferred! This is another story...

Like MS Excel 2003 (and below), no more than three conditions are accepted; yet MS Excel 2007 (and above) accpet more than three. Besides, OO.o calc does not correct the reference in conditional formatting after inserting/deleting columns or rows in sheets, but correct the reference in formula! Is it a bug?
OOo 3.2.1 on Ms Windows XP
ToresVN
 
Posts: 41
Joined: Sun Nov 30, 2008 3:28 pm

Re: How to reduce size (large amount similar formula)

Postby Villeroy » Fri Jan 08, 2010 1:14 pm

One of the most important differences between Excel and Calc:
Excel:
=TRUE()=1 =>FALSE
=FALSE()=0 =>FALSE
[however Excel does evaluate true as 1 and false as zero in arithmetic context: =3+TRUE => 4]

In Calc TRUE is just a formatted one and FALSE is a formatted zero.
=TRUE()=1 =>TRUE
=FALSE()=0 =>TRUE
Any numeric value can be formatted as boolean. All numbers except zero are displayed as TRUE then.

In both applications no formatting attribute (border, color, font, orientation and number formats neither) will ever change one of your values. All values and resutls calculated from values are 100% independent from formatting attributes.
However, number formatting may have some impact on the way how newly entered/imported values evaluate to numbers, but it does not influence any existing data.

ToresVN wrote:... the 3rd layer IF always return true statement. Yet I can't tell exactly how to reproduce it.

Please, find out quickly! Such a heavy bug would be a show stopper for the next release due this month, I think.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 12.04, Apache OpenOffice 4.x
User avatar
Villeroy
Volunteer
 
Posts: 17320
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: How to reduce size (large amount similar formula)

Postby acknak » Fri Jan 08, 2010 4:20 pm

the matrix method is
for (int i=1;i++<5;putchar('\n'))
for (int j=0;j<i;j++)
putchar('*');

Why use a spreadsheet at all? For simple calculations over a large data set, wouldn't a program be faster, more efficient and more dependable?
AOO 4 • Linux • Fedora 17
User avatar
acknak
Moderator
 
Posts: 17424
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: How to reduce size (large amount similar formula)

Postby ToresVN » Mon Mar 08, 2010 2:30 pm

Why use a spreadsheet at all? For simple calculations over a large data set, wouldn't a program be faster, more efficient and more dependable?

You're right but most users are not able to implement any idea into a program. Both OO.o calc and MS Excel do good in presenting data. If one choose to write a program to solve this, he'll firstly encounter UI problem. To display edit data is easy, but to edit through UI is difficult. Doing such a stuff likes building a whole new "calc".

As for UI problem, HTML + Javascript is a good candidate. However, file accessing is not possible without external support like google gear.
OOo 3.2.1 on Ms Windows XP
ToresVN
 
Posts: 41
Joined: Sun Nov 30, 2008 3:28 pm


Return to Calc

Who is online

Users browsing this forum: No registered users and 25 guests