Multple regression using linest.

Discuss the spreadsheet application

Multple regression using linest.

Postby jrkrideau » Tue Aug 21, 2018 3:55 pm

I was looking over some old (pre-2007) test programs written in R and noticed one commenting on problems doing a multiple regression in Excel. My comments suggested that the same problem may have been in Calc.

I never use Calc for stats and I am really not interested in spending an hour or two to see if the problem still exists in Calc. Is there anyone here who can do a multiple regression in Calc in a couple of minutes and check to see if the problem still exists. Problem was X3 = X1 + X2 in the attached file and back in 2007 Excel was using X3

Thanks.


In R , I get
Call:
lm(formula = y ~ x1 + x2 + x3, data = xx)

Coefficients:
(Intercept) x1 x2 x3
16.333 4.133 1.333 NA
Attachments
linestmr.ods
(8.93 KiB) Downloaded 337 times
OpenOffice 4.1.6; Ubuntu 18.04
jrkrideau
Volunteer
 
Posts: 3690
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: Multple regression using linest.

Postby FJCC » Tue Aug 21, 2018 4:36 pm

Yes, Calc gives a crazy answer fitting y to x1, x2 and x3. Calc matches the coefficients for x1 and x2 that R gives if I exclude x3 from the Calc fit. My fits however, do not match your result.
I get
intercept = 10, x1 = 9.2, x2 = -5.0 x3 = NA
from the lm() function in R.
Windows 10 and Linux Mint, since 2017
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
FJCC
Moderator
 
Posts: 7260
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: Multple regression using linest.

Postby jrkrideau » Tue Aug 21, 2018 5:39 pm

Thanks for the test , though I hate to see that Calc is still seriously messing up.

I must say I find your R results on the face of it, more believable than mine. I thought the intercept in my code looked suspiciously big but since it was 11 year old code and I was more interested in the lm() issue I did not investigate it. We may be reading in the data slightly different format? Or I messed up when moving the data into Calc and we have slightly different data sets?

Your results do confirm my belief that using a spreadsheet for anything more complicated than a shopping list is dangerous.

Here is my R data in dput() format and the code:

xx <- structure(list(x1 = c(1L, 3L, 5L, 7L, 9L), x2 = c(2L, 2L, 4L,
4L, 6L), x3 = c(3L, 5L, 9L, 11L, 15L), y = c(21L, 33L, 45L, 49L,
61L)), class = "data.frame", row.names = c(NA, -5L))

lm(y ~ x1 + x2 + x3, data = xx)
OpenOffice 4.1.6; Ubuntu 18.04
jrkrideau
Volunteer
 
Posts: 3690
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: Multple regression using linest.

Postby FJCC » Tue Aug 21, 2018 6:34 pm

In the spreadsheet you posted the first y value is 2, not 21. Mystery solved!
Windows 10 and Linux Mint, since 2017
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
FJCC
Moderator
 
Posts: 7260
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: Multple regression using linest.

Postby jrkrideau » Tue Aug 21, 2018 7:35 pm

Aha, thanks.
OpenOffice 4.1.6; Ubuntu 18.04
jrkrideau
Volunteer
 
Posts: 3690
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: Multple regression using linest.

Postby MrProgrammer » Tue Aug 21, 2018 8:24 pm

jrkrideau wrote:I never use Calc for stats and I am really not interested in spending an hour or two to see if the problem still exists in Calc. Is there anyone here who can do a multiple regression in Calc in a couple of minutes and check to see if the problem still exists
As has been noted in other posts, Calc was never designed as a tool for statistical analysis. It can do calculations, but they are based on the assumption that the creator of the spreadsheet knows what they are doing, often not the case. R is designed for statistical analysis and assumes more responsibility for providing meaningful answers even in exceptional situations. Thus it is much safer to use for that type of work. R can detect that the X-variables are not independent; Calc relies on the user to ensure this is true.

With the first Y value being 21, not 2, asking for =LINEST(y-data;x-data;1;1) I get
-924129013103307  924129013103309   924129013103311   15.8
1728965634269560 1728965634269560 1728965634269560 117376436.486455
0.9861365972 3.6873110427 #N/A #N/A
23.7107876142 1 #N/A #N/A
967.1342934824 13.5962627253 #N/A #N/A

The first indication that something is wrong are the large slope values on the first line. However another clue is the unexpected value of 1 for degrees of freedom (green, 4th row, 2nd column). I would expect 3 when using three independent variables. It would be interesting to know what R says for the degrees of freedom. I would guess that R will say 2, because X3=X1+X2 and thus X3 is not an independent variable. I did not see any reported problems in Bugzilla with the calculation of degrees of freedom in LINEST, but perhaps no one has noticed this.

I believe this situation is similar to that described in [SOLVED] LINEST Yielding #VALUE! Error with intercept where the solution was to omit a variable from the analysis.

jrkrideau wrote:Coefficients:
(Intercept) x1 x2 x3
16.333 4.133 1.333 NA
Omitting X3 from the analysis =LINEST(y-data;x-data;1;1) gives
1.3333333333    4.1333333333   16.3333333333
2.6666666667 1.4110673659 3.9553058598
0.9818594104 2.9211869734 #N/A
54.125 2 #N/A
923.7333333333 17.0666666667 #N/A
Mr. Programmer
AOO 4.1.5 Build 9789 on MacOS 10.11.6.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
MrProgrammer
Moderator
 
Posts: 3812
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Multple regression using linest.

Postby FJCC » Tue Aug 21, 2018 9:15 pm

It would be interesting to know what R says for the degrees of freedom.

R does report 2 degrees of freedom.
Windows 10 and Linux Mint, since 2017
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
FJCC
Moderator
 
Posts: 7260
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: Multple regression using linest.

Postby jrkrideau » Tue Aug 21, 2018 11:07 pm

MrProgrammer wrote:As has been noted in other posts, Calc was never designed as a tool for statistical analysis. It can do calculations, but they are based on the assumption that the creator of the spreadsheet knows what they are doing, often not the case.


I'd say the assumption is wrong most of the time. http://dx.doi.org/10.1186/1471-2105-5-80. Not to mention spreadsheets are very error-prone, especially if the error gives you the result you want. http://www.bloomberg.com/news/articles/ ... ed-history.

I suspect the majority of spreadsheet users are either self-taught or have had a 3-day Intro to Excel course and that's it. Most users will not have the mathematical or statistical sophistication notice the degrees-of-freedom problem or to test for multicollinearity. Also most of us tend to trust that the software works, at least reasonably well unless we find something really strange like an negative R-squared. You might find this interesting http://www.mediafire.com/file/vjn0renhj ... e.pdf/file.

I would guess that R will say 2, because X3=X1+X2 and thus X3 is not an independent variable.


Exactly
ls$df.residual
[1] 2

I did not see any reported problems in Bugzilla with the calculation of degrees of freedom in LINEST, but perhaps no one has noticed this.


I never thought of reporting it as a bug. IIRC, I noticed this in a pre-2007 Excel post and just assumed it was a known problem. However, most people, Nobel prize-winning economists excepted) who are doing serious stats probably would not use a spreadsheet for real work so it may well have not been noticed or else they made the same assumption I did.

I suppose I'll have to learn how to file a bug report.
OpenOffice 4.1.6; Ubuntu 18.04
jrkrideau
Volunteer
 
Posts: 3690
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: Multple regression using linest.

Postby jrkrideau » Wed Aug 22, 2018 5:35 pm

@ Mr Programmer
I am drafting a bug report for, IIRC, the first time. Would I be able to get you to review it before I summit?
OpenOffice 4.1.6; Ubuntu 18.04
jrkrideau
Volunteer
 
Posts: 3690
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: Multple regression using linest.

Postby MrProgrammer » Wed Aug 22, 2018 8:37 pm

jrkrideau wrote:Would I be able to get you to review it before I summit?
Sure. What issue is being reported?
• That Calc misses detecting the X-variables are not independent and gives fallacious results?
• That LINEST's calculation for Degrees of Freedom is bogus?
• Something else?
We will want to attach a test case for them with known correct results, say as calculated by R. It won't surprise me if the issue is closed with "We are just doing it the same way Excel does, for compatibility, even though it's wrong."
Mr. Programmer
AOO 4.1.5 Build 9789 on MacOS 10.11.6.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
MrProgrammer
Moderator
 
Posts: 3812
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Multple regression using linest.

Postby Lupp » Wed Aug 22, 2018 9:34 pm

I stepped in with testing the example given by jrkrideau (2 in A2 replaced with 21 as already suggested) under AOO V4.1.5 and under LibO V6.1.0.1RC.

LibO did a better job recognising the linear dependence of the x-vectors, but failed to draw the correct conclusion concerning the results returned by LINEST(). See attached example for details. If possible for you at all you should open it with a recent AOO and a recent LibO as well.

The mess is well prepared by the specification of LINEST in 'OpenFormula'.
(I would prefer independent checks by you as mathematicians over a comparison with results gottem from 'R'.)

Editing: I would like to add a few remarks.
We now and then have to tell less experienced users that results numerically calculated using a representation of numbers with limited accuracy (as IEEE 754 'Double' is) cannot reliably be compared for equality. Spreadsheet software uses tricks I don't know exactly to be able to tell us a=b where in fact a denormalised(?) difference would remain. Nonetheless we sometimes get NOT(a=b) where stringent evidence tells a=b. Validity of the equation a*d=c*b is, however, the basic test for linear dependence of (a, b) and (c, d). If we cannot exactly determine dependence in 2D by machine arithmetic we surely also cannot determine the dependence of three or more vectors in multidimensional spaces. To calculate the rank of a matrix or to select a maximal independent subset from a set of vectors (basis of the linear hull) are absolutely nontrivial tasks, which never can be solved by calculations in dyadic numbers of given length of the mantissa reliably. Whatever software may claim, to solve tasks of the kind exactly has to rely on algorithms NOT resorting to numbers in a standardised representation of the mentioned kind, but working with rational and -depending on the circumstances- many algebraic numbers without any need of rounding.
You may allow me to guess that indecidability of dimension even is a deep-lying problem concerning the application of mathematics to physics.
Attachments
MultiLinest3BugLibO6101RC.ods
(14.83 KiB) Downloaded 321 times
On Windows 10: LibreOffice 6.2 and older versions, PortableOpenOffice 4.1.5 and older, StarOffice 5.2
---
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 2528
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Multple regression using linest.

Postby jrkrideau » Fri Aug 24, 2018 11:06 pm

@ Mr.Programmer
Sure. What issue is being reported?

• That Calc misses detecting the X-variables are not independent and gives fallacious results? • That LINEST's calculation for Degrees of Freedom is bogus?


My intention was to report the first issue, under the unwarranted assumption that the bogus Degrees of Freedom result was an outcome of the non-independence issue simply using the simple example I created. I went back, re-read your comment, and realized we had a very serious issue here as well. Currently I am not sure of the approach but I believe we have to consider two separate bug reports. One for the collinearity issue and one for the Degrees of Freedom issue.

First Step for Example on Degrees of Freedom Problem.

I put together a simple example without the not-independent problem (degrees.of.freedom.problem.ods) Unless I am doing something wrong, I would have expected df = 3 in the attached example. There is no correlation above 0.558 among the variables. So the DF problem is independent of the collinearity issue. `

As an aside, I ran the same regression in Gnumeric and it appears to be giving the expected (R) results—at least the df's are correct and it had dropped x3 from the analysis. The numbers looked about right (i.e. matching R results). I did not check each number carefully.

<Problem

Lupp's example (MultiLinest3BugLibO6101RC.ods) does seem to be giving us the correct dfs.So I now have an anomalous result.

My LINEST code used 1, 1 in the logic to indicate TRUE. Lupp's code used TRUE(), TRUE() so I tried that in my example and still have the df's problem. Any suggestions? Lupp would you have a look at <degrees.of.freedom.problem.ods> and see if I am misunderstanding what you are doing in comparison with what I am doing. Remember, I seldom use a spreadsheet and never for statistics so I am blundering around here.

Regression Example

It strikes me that Lupp's examples are a much better illustration of the regression issue in many ways than my bare-bones example, particularly where he points out that LibO V 6.1.0.1RC can well recognize that the x-vectors are linearly dependent. thus helping isolate the issue. Perhaps he might let me steal some of it.

In any case, the example, itself, is self-explanatory if the reader realizes the implications of non-independence. If they do not we need some really good explanations rather than a working example

We will want to attach a test case for them with known correct results, say as calculated by R. It won't surprise me if the issue is closed with "We are just doing it the same way Excel does, for compatibility, even though it's wrong."


As Lupp also put it, see below.

@ Lupp

I would prefer independent checks by you as mathematicians over a comparison with results gottem from 'R'.

Flee! There is an attack team from R-Core on the way from Wien as I type.

It would be very helpful to have some informed comment from some mathematicians/statisticians. Even having someone second your point re the non-invertible matrix, alone, would add to the argument.

That is a beautiful example of the collinearity problem. Assuming I am reading your comments correctly, if the matrix is not invertible where the blazes is Calc getting the results? A Ouija board?

IMO this is probably the kind of nonsense introduced by:"We are just doing it the same way Excel does, for compatibility, even though it's wrong."But this not only by the implementation but already by the specification mimicking conscientiousness.


Yes. Interestingly enough, as I mentioned above Gnumeric seems to perform better in this case. It was rumoured that during Gnumeric's development that there was some serious consulting with statisticians.

<@Anyone

I agree with Lupp, we have a slavish imitation of Excel, right or wrong all too often. It would be nice to see they have cleaned up their acts. Knowing MS…. Does anyone have easy access to Excel? I would like to see if the multicollinearity problem still exists there and if the Degrees of Freedom problem also exists. I can do it, but it means a couple of hours of slogging around that I'd prefer to avoid. Thanks.
Attachments
degrees.of.freedom.problem.ods
(11.22 KiB) Downloaded 325 times
OpenOffice 4.1.6; Ubuntu 18.04
jrkrideau
Volunteer
 
Posts: 3690
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: Multple regression using linest.

Postby Lupp » Sat Aug 25, 2018 3:23 pm

jrkrideau wrote:Lupp's example (MultiLinest3BugLibO6101RC.ods) does seem to be giving us the correct dfs.So I now have an anomalous result.
My LINEST code used 1, 1 in the logic to indicate TRUE. Lupp's code used TRUE(), TRUE() ...
I only used Logical because the OpenFormula specification declares the Const and Stats parameters Logical with default TRUE in the first case and FALSE in the second. Even concerning spreadsheet software we may trust that automatic conversion will be applied if so specified.
jrkrideau wrote:It strikes me that Lupp's examples are a much better illustration of the regression issue in many ways than my bare-bones example, particularly where he points out that...
LibO V 6.1.0.1RC can well recognize that the x-vectors are linearly dependent.
...thus helping isolate the issue. Perhaps he might let me steal some of it.
I'm afraid there might be a misundertsnding concerning the abilty of LibO 6 to recognise linear dependence. It depends heavily on the specific examples - and I did no try to get evidence concerning the comparison with AOO, not even only statistical-experimental. Anyway nobody can steal her anything from my posts because everything is CopyLeft 0.
jrkrideau wrote:@ Lupp{ who stated:}
I would prefer independent checks by you as mathematicians over a comparison with results gottem from 'R'.
Flee! There is an attack team from R-Core on the way from Wien as I type.
Well, make sure not to flee to Bavaria. Seehofer and Söder would get you captured and extradite you to the Taliban.
jrkrideau wrote:It would be very helpful to have some informed comment from some mathematicians/statisticians. Even having someone second your point re the non-invertible matrix, alone, would add to the argument.
Did you read my statement about comparing-decisions based on IEEE 754 'Double' arithmetic. In what way do I appear to be not enough informed insofar? Take any software for formal mathematics/statistics also capable of switching to approximate calculations for a question of the kind under discussion and it will return unreliable results (or, if clever, no result at all in doubtable cases). Having approximation switched off there will be cases where a comparison cannot be finished due to finiteness of tme and RAM or due to lacking abilities in algebra. 'R' (which I never used) surely also does algebra instead of approximative calculations when making decisions of the mentioned kind. Concerning the "matrix-criterion" you find explanation in my next example.
jrkrideau wrote:That is a beautiful example of the collinearity problem. Assuming I am reading your comments correctly, if the matrix is not invertible where the blazes is Calc getting the results? A Ouija board?
I'm not quite sure of what example you are talking, don't know anything about Ouija, and can only guess the meaning of "where the blazes". However, my answer is: It will also try to guess, may do it in a slightly different way as compared to AOO or to ExcelWhichOne - and will often be wrong. Most errors will be wrong "independent".
IMO this is probably the kind of nonsense introduced by:"We are just doing it the same way Excel does, for compatibility, even though it's wrong."But this not only by the implementation but already by the specification mimicking conscientiousness.
In conseequence of my already mentioned statements about approximating calculations this may not be of much importance in the given case. Excel cannot do it reliably and so can't Calc.
jrkrideau wrote:Yes. Interestingly enough, as I mentioned above Gnumeric seems to perform better in this case. It was rumoured that during Gnumeric's development that there was some serious consulting with statisticians.
Serious consulting is always a good idea. If it was useful its result was that Gnumeric would need a complete module for algebraic manipulations.
As already mentioned there comes a new example doc. It also contains a few statements about my personal background and a DISCLAIMER.
The examples were made with LibO 6.1.0.1RC.

(Hope the fixed example is not too bad now.)
Attachments
aoo94782_LinestSupposedBug_2.ods
(21.18 KiB) Downloaded 314 times
Last edited by Lupp on Sat Aug 25, 2018 4:25 pm, edited 1 time in total.
On Windows 10: LibreOffice 6.2 and older versions, PortableOpenOffice 4.1.5 and older, StarOffice 5.2
---
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 2528
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Multple regression using linest.

Postby jrkrideau » Sat Aug 25, 2018 3:58 pm

<i>Did you read my statement about comparing-decisions based on IEEE 754 'Double' arithmetic. </i>
I did but my assumption was if Gnumeric and R appear to handle the problem than it is not IEEE 754 'Double' arithmetic problem per se. I am aware of the problem but not the extent. Totally outside my field.

You have lost me on the switch from <i>ke any software for formal mathematics/statistics also capable of switching to approximate calculations for a question of the kind under discussion</i>

<i>I'm not quite sure of what example you are talking</i>
Just the discussion of the non-invertible matrix. Made things very clear.

<i>In what way do I appear to be not enough informed insofar?</i>
What? I just meant that this is likely to be a bug the developers don't want to know about, so any corroboration may be helpful even if it is redundant.

 Edit: Ouija board
a board printed with letters, numbers, and other signs, to which a planchette or movable indicator points, supposedly in answer to questions from people at a seance. 
Ouija.jpeg
Ouija.jpeg (11.83 KiB) Viewed 7873 times
OpenOffice 4.1.6; Ubuntu 18.04
jrkrideau
Volunteer
 
Posts: 3690
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: Multple regression using linest.

Postby MrProgrammer » Mon Aug 27, 2018 8:09 am

jrkrideau wrote:Currently I am not sure of the approach but I believe we have to consider two separate bug reports. One for the collinearity issue and one for the Degrees of Freedom issue.
Thinking more about the situation, my advice is to report one issue, the failure to detect the non-independence of the X variables. Your original post in the topic demonstrates the issue but we should have the first Y value 21, not 2, to match the calculation of the coeffieients by R and/or GNumeric.

As long as floating point limitations aren't severe, it is not hard to tell that the X-variables aren't independent. All Calc needs to do is put the matrix of variables in column echelon form. The column of zeros on the right indicates the trouble. This method should be less sensitive to floating point limitations than calculating the inverse of Xᵀ⋅X. In the second step, we've subtracted twice the first column from the second column. In the third step , we've subtracted thrice the first column from the third column. In the last step we've subtracted the third column from the fourth column.
Screen Shot 2018-08-26 at 23.01.00 .png
Column echelon form

The straightforward algorithm to calculate the slope and intercept coefficients is only valid if the X-variables are independent. A fix for the issue might come in two forms, corresponding to the options in the ODF standard: If Xᵀ⋅X is not invertible, then the columns of X are linearly dependent. In this case an evaluator shall return an error or select any maximal linearly independent subset of these columns …. From a computatonal perspective it may be better to test the row echelon form instead of (Xᵀ⋅X)⁻¹. The ODF standard does not appear to permit Calc to fail to detect non-independence, whether or not the AOO documentation specifies that this is required by LINEST.

Given that the straightforward algorithm is invalid if the X-variables aren't independent, I'm inclined not to report an issue with degrees of freedom until the collinearity issue is resolved. A fix for that will likely also correct any degrees of freedom issues. If the X-variables are really independent, I believe LINEST's computation for degrees of freedom is correct and will match the value determined by R.

MrProgrammer wrote:I would expect 3 [instead of 1] when using three independent variables.
Oops. That's incorrect. If we make the assumption that the variables are independent, there is one degree of freedom, as reported by LINEST. 5 (observations) - 3 (independent variables) - 1 (intercept) is 1. If we were to detect that one of the variables is not independent we have two degrees of freedom. 5 (observations) - 2 (independent variables) - 1 (intercept) is 2.

jrkrideau wrote:I would have expected df = 3 in the attached example. There is no correlation above 0.558 among the variables.
Screen Shot 2018-08-26 at 23.45.56 .png
Independent variables example
6 (observations) - 3 (independent variables) - 1 (intercept) is 2, as calculated by LINEST.

jrkrideau wrote:if the matrix [from MultiLinest3BugLibO6101RC.ods] is not invertible where the blazes is Calc getting the results? A Ouija board?
Translations for non-native speakers of English:
"where the blazes" → I'm mystified
"Ouija board" → Device used to obtain answers from the spirits at a seance. The term is of uncertain origin: possibly from French oui ‘yes’ + German ja ‘yes.’
For this situation, the actual determinant of the matrix is zero and it is not invertable. However, due to limitations of floating point arithmetic, the calculated determinant is nonzero, though very small, and the matrix seems to have an inverse. Calc proceeds under the (mistaken) assumption that the X variables are independent.
Mr. Programmer
AOO 4.1.5 Build 9789 on MacOS 10.11.6.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
MrProgrammer
Moderator
 
Posts: 3812
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Multple regression using linest.

Postby Lupp » Mon Aug 27, 2018 9:39 am

[BTW
de("row echelon matrix") = "{obere} Dreiecksmatrix"
de("column echelon matrix") = "untere Dreiecksmatrix"
de("transformable to row echelon form") = "trigonalisierbar" ]

Is there evidence that any transformation of matrices based on calculations with limited precision (like with 'Double') can yield a reliable result concerning the RANK? Why shouldn't a Gaussian transform judge an actually nullifiable column not to be so based on rounded calculations?

If we abandon the unuttered but probably pending assumption of all the matrix-elements being integers, even the original elements will mostly not be represented exactly. Elimination will in addition require divisions.

My (unproven) assumption was that determinants would at least be calculated based on multiplication and (generalised) addition only, and that this would minimise the risk of relevant deviations. Nonetheless even a simple example of a definitely non-invertible quadratic (7 x 7) matrix with integer elements was calculated to have the determinant -1.62E-38 (See attachment to previous post aoo94782_LinestSupposedBug_2.ods). I suppose that some addends to use differed by more than 2^52 dyadic order of magnitude, and that this prevented the exact result 0. (I do not know the used algorithm and had expected it otherwise for the specific example.)

Anyway:
-1- Using approximative calculations will always make decisions of the needed kind unreliable.
-2- Using the determinant to decide the invertibility of a quadratic (n x n) matrix should be a rather safe way.
-3- Nonetheless it will require a kind of "guessing by inequations".
-4- It also cannot even tell the RANK if <n, and the selection of "any maximal linearly independent subset of these columns" remains a major problem. (Do you know an efficient way to do it?)

Of course this is not an objection to filing a bug report concerning LINEST as it is:
-1- It should be made explicit which one of the choices admitted by 'OpenFormula' were made.
-2- LINEST shouldn't do it wrong even in cases where the determinant criterion told "not independent".
-3- It should explicitly admit that (and in what way) it guesses concerning independence decisions.

Should someone suggest remedy?

Mathematical subs needed for everyday software must be developed by mathmaticians based on specifications made by mathematicians.
The complete portable wxMaxima (including the used Lisp and GnuPlot and a GUI adaption and everythging) is <140 MiB on "my stick". There surely is an API allowing to call its routines from Calc, and surely the package can be reduced and recompiled for use without the many more complicated functions to very few MiB. It might be integrated then into Calc for respective kinds of tasks. Same probably with 'R'. I personally am a kind of tinkerer (specifically again since I'm retird) and have much fun with reinventing any kind of wheel. Well developed software to be widely distributed should not be done the tinkerer way probably.

I feel sure that Maxima didn't obtain the results by 'Double' calculations.
See also: https://math.stackexchange.com/question ... ets#164021 .
Attachments
MentionedExampleWithMaxima.png
On Windows 10: LibreOffice 6.2 and older versions, PortableOpenOffice 4.1.5 and older, StarOffice 5.2
---
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 2528
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Multple regression using linest.

Postby Villeroy » Mon Aug 27, 2018 12:56 pm

Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, no OpenOffice, LibreOffice 6.x
User avatar
Villeroy
Volunteer
 
Posts: 27105
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Multple regression using linest.

Postby Lupp » Sat Sep 01, 2018 10:44 pm

On Windows 10: LibreOffice 6.2 and older versions, PortableOpenOffice 4.1.5 and older, StarOffice 5.2
---
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 2528
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Multple regression using linest.

Postby jrkrideau » Tue Sep 18, 2018 6:20 pm

My apologies for taking so long to get back to everyone. A combination of some small emergencies and a minor illness—early 'flu perhaps?—tied me up a bit.

I have managed to cobble together a draft that I think explains the problem. It is currently in PDF format as it was easier for me to see what I was doing.

I propose to summit this (after input and editing) along with a .ods file illustrating the problem, an example of a linear equation without multicollinearity, and an somewhat edited version of Lupp's discussion of the problem.

Lupp, I hope I have not done too much damage to your arguments. I have not done anything in matrix algebra in at least 40 years so I may have made some mistakes and I am worried that I left out one matrix that may be more important than I realize.

Mr. Programmer, I have ommited your suggestion about wxMaxima since I was not sure if you were discussing the issue in general or intended it as a specific approach to solving the problem. If the latter, I will include it.

Mr. Programmer, I believe you are correct that if the multicolinearity issue is solved the degrees of freedom issue goes away. When I thought I had the wrong df's I had forgotten to include the intercept term.

So far, I have learned far more about doing a linear regression in a spreadsheet than I ever wanted to and I have only scratched the surface.
Attachments
aoo94782---rearragne for pub.ods
(25.28 KiB) Downloaded 328 times
statement_of_problem (Draft1 2018-09-18).pdf
(92.26 KiB) Downloaded 293 times
OpenOffice 4.1.6; Ubuntu 18.04
jrkrideau
Volunteer
 
Posts: 3690
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: Multple regression using linest.

Postby MrProgrammer » Wed Sep 26, 2018 1:40 am

jrkrideau wrote:Mr. Programmer, I have ommited your suggestion about wxMaxima since …
This comment was from Lupp.

jrkrideau in statement_of_problem (Draft1 2018-09-18).pdf wrote:The correlation between x1 and x3 is r= 0.993 which indicates a case of multicolinearity.
The difficulty is not due to notable correlation between any pairs of variables, as long as it isn't 1 which would indicate the pair isn't independent. LINEST works fine if I drop variable X2 and use X1 and X3 for the analysis.

The difficulty is created because the three variables in the second argument of LINEST are not independent. We can demonstrate the same difficulty when each pair shows poor correlation. Below, X1:X2, X2:X3, and X1:X3 all have correlations of about ±½. But since X3=X1+X2 the linear regression will give large, unreasonable values (about 10¹⁵) for the coefficients unless a variable is excluded from the analysis.
 Y   X1   X2   X3
21 1 -4 -3
33 3 2 5
45 5 -3 2
49 7 -6 1
61 9 -5 4

jrkrideau wrote:I propose to summit this …
It will not hurt to remind the developers of the following sentence in the ODF standard:
If Xᵀ⋅X is not invertible, then the columns of X are linearly dependent. In this case an evaluator shall return an error or select any maximal linearly independent subset of these columns ….
Mr. Programmer
AOO 4.1.5 Build 9789 on MacOS 10.11.6.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
MrProgrammer
Moderator
 
Posts: 3812
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Multple regression using linest.

Postby Lupp » Thu Nov 22, 2018 6:22 pm

Did somebody meanwhile submit a bug report? Did I miss the link?
Was the LibreOffice bug tracker included?
Did someone already challenge LibreOffice 6.2 (Beta1 or Alpha) concerning https://wiki.documentfoundation.org/Rel ... s/6.2#Calc ?

===Edit 2018-11-22 17:27CET===
Just tried the original example (with the "2" in place) with LibreOffice 6.1.3.2 and with 6.2.0.0+Beta1. Unchanged results.
On Windows 10: LibreOffice 6.2 and older versions, PortableOpenOffice 4.1.5 and older, StarOffice 5.2
---
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 2528
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany


Return to Calc

Who is online

Users browsing this forum: Google [Bot], Zizi64 and 18 guests