As has been noted in other posts, Calc was never designed as a tool for statistical analysis. It can do calculations, but they are based on the assumption that the creator of the spreadsheet knows what they are doing, often not the case. R is designed for statistical analysis and assumes more responsibility for providing meaningful answers even in exceptional situations. Thus it is much safer to use for that type of work. R can detect that the X-variables are not independent; Calc relies on the user to ensure this is true.jrkrideau wrote:I never use Calc for stats and I am really not interested in spending an hour or two to see if the problem still exists in Calc. Is there anyone here who can do a multiple regression in Calc in a couple of minutes and check to see if the problem still exists
-924129013103307 924129013103309 924129013103311 15.8
1728965634269560 1728965634269560 1728965634269560 117376436.486455
0.9861365972 3.6873110427 #N/A #N/A
23.7107876142 1 #N/A #N/A
967.1342934824 13.5962627253 #N/A #N/A
Omitting X3 from the analysis =LINEST(y-data;x-data;1;1) givesjrkrideau wrote:Coefficients:
(Intercept) x1 x2 x3
16.333 4.133 1.333 NA
1.3333333333 4.1333333333 16.3333333333
2.6666666667 1.4110673659 3.9553058598
0.9818594104 2.9211869734 #N/A
54.125 2 #N/A
923.7333333333 17.0666666667 #N/A
It would be interesting to know what R says for the degrees of freedom.
MrProgrammer wrote:As has been noted in other posts, Calc was never designed as a tool for statistical analysis. It can do calculations, but they are based on the assumption that the creator of the spreadsheet knows what they are doing, often not the case.
I would guess that R will say 2, because X3=X1+X2 and thus X3 is not an independent variable.
I did not see any reported problems in Bugzilla with the calculation of degrees of freedom in LINEST, but perhaps no one has noticed this.
Sure. What issue is being reported?jrkrideau wrote:Would I be able to get you to review it before I summit?
Sure. What issue is being reported?
• That Calc misses detecting the X-variables are not independent and gives fallacious results? • That LINEST's calculation for Degrees of Freedom is bogus?
We will want to attach a test case for them with known correct results, say as calculated by R. It won't surprise me if the issue is closed with "We are just doing it the same way Excel does, for compatibility, even though it's wrong."
IMO this is probably the kind of nonsense introduced by:"We are just doing it the same way Excel does, for compatibility, even though it's wrong."But this not only by the implementation but already by the specification mimicking conscientiousness.
I only used Logical because the OpenFormula specification declares the Const and Stats parameters Logical with default TRUE in the first case and FALSE in the second. Even concerning spreadsheet software we may trust that automatic conversion will be applied if so specified.jrkrideau wrote:Lupp's example (MultiLinest3BugLibO6101RC.ods) does seem to be giving us the correct dfs.So I now have an anomalous result.
My LINEST code used 1, 1 in the logic to indicate TRUE. Lupp's code used TRUE(), TRUE() ...
I'm afraid there might be a misundertsnding concerning the abilty of LibO 6 to recognise linear dependence. It depends heavily on the specific examples - and I did no try to get evidence concerning the comparison with AOO, not even only statistical-experimental. Anyway nobody can steal her anything from my posts because everything is CopyLeft 0.jrkrideau wrote:It strikes me that Lupp's examples are a much better illustration of the regression issue in many ways than my bare-bones example, particularly where he points out that...
LibO V 6.1.0.1RC can well recognize that the x-vectors are linearly dependent.
...thus helping isolate the issue. Perhaps he might let me steal some of it.
Well, make sure not to flee to Bavaria. Seehofer and Söder would get you captured and extradite you to the Taliban.jrkrideau wrote:@ Lupp{ who stated:}
I would prefer independent checks by you as mathematicians over a comparison with results gottem from 'R'.
Flee! There is an attack team from R-Core on the way from Wien as I type.
Did you read my statement about comparing-decisions based on IEEE 754 'Double' arithmetic. In what way do I appear to be not enough informed insofar? Take any software for formal mathematics/statistics also capable of switching to approximate calculations for a question of the kind under discussion and it will return unreliable results (or, if clever, no result at all in doubtable cases). Having approximation switched off there will be cases where a comparison cannot be finished due to finiteness of tme and RAM or due to lacking abilities in algebra. 'R' (which I never used) surely also does algebra instead of approximative calculations when making decisions of the mentioned kind. Concerning the "matrix-criterion" you find explanation in my next example.jrkrideau wrote:It would be very helpful to have some informed comment from some mathematicians/statisticians. Even having someone second your point re the non-invertible matrix, alone, would add to the argument.
I'm not quite sure of what example you are talking, don't know anything about Ouija, and can only guess the meaning of "where the blazes". However, my answer is: It will also try to guess, may do it in a slightly different way as compared to AOO or to ExcelWhichOne - and will often be wrong. Most errors will be wrong "independent".jrkrideau wrote:That is a beautiful example of the collinearity problem. Assuming I am reading your comments correctly, if the matrix is not invertible where the blazes is Calc getting the results? A Ouija board?
In conseequence of my already mentioned statements about approximating calculations this may not be of much importance in the given case. Excel cannot do it reliably and so can't Calc.IMO this is probably the kind of nonsense introduced by:"We are just doing it the same way Excel does, for compatibility, even though it's wrong."But this not only by the implementation but already by the specification mimicking conscientiousness.
Serious consulting is always a good idea. If it was useful its result was that Gnumeric would need a complete module for algebraic manipulations.jrkrideau wrote:Yes. Interestingly enough, as I mentioned above Gnumeric seems to perform better in this case. It was rumoured that during Gnumeric's development that there was some serious consulting with statisticians.
Edit: Ouija board a board printed with letters, numbers, and other signs, to which a planchette or movable indicator points, supposedly in answer to questions from people at a seance. |
Thinking more about the situation, my advice is to report one issue, the failure to detect the non-independence of the X variables. Your original post in the topic demonstrates the issue but we should have the first Y value 21, not 2, to match the calculation of the coeffieients by R and/or GNumeric.jrkrideau wrote:Currently I am not sure of the approach but I believe we have to consider two separate bug reports. One for the collinearity issue and one for the Degrees of Freedom issue.
Oops. That's incorrect. If we make the assumption that the variables are independent, there is one degree of freedom, as reported by LINEST. 5 (observations) - 3 (independent variables) - 1 (intercept) is 1. If we were to detect that one of the variables is not independent we have two degrees of freedom. 5 (observations) - 2 (independent variables) - 1 (intercept) is 2.MrProgrammer wrote:I would expect 3 [instead of 1] when using three independent variables.
6 (observations) - 3 (independent variables) - 1 (intercept) is 2, as calculated by LINEST.jrkrideau wrote:I would have expected df = 3 in the attached example. There is no correlation above 0.558 among the variables.
Translations for non-native speakers of English:jrkrideau wrote:if the matrix [from MultiLinest3BugLibO6101RC.ods] is not invertible where the blazes is Calc getting the results? A Ouija board?
This comment was from Lupp.jrkrideau wrote:Mr. Programmer, I have ommited your suggestion about wxMaxima since …
The difficulty is not due to notable correlation between any pairs of variables, as long as it isn't 1 which would indicate the pair isn't independent. LINEST works fine if I drop variable X2 and use X1 and X3 for the analysis.jrkrideau in statement_of_problem (Draft1 2018-09-18).pdf wrote:The correlation between x1 and x3 is r= 0.993 which indicates a case of multicolinearity.
Y X1 X2 X3
21 1 -4 -3
33 3 2 5
45 5 -3 2
49 7 -6 1
61 9 -5 4
It will not hurt to remind the developers of the following sentence in the ODF standard:jrkrideau wrote:I propose to summit this …
Users browsing this forum: No registered users and 12 guests