[Solved] Conditional counting of cases with disjunction

Discuss the spreadsheet application
Post Reply
phl43
Posts: 10
Joined: Tue Jun 06, 2017 7:35 pm

[Solved] Conditional counting of cases with disjunction

Post by phl43 »

I already asked a question about conditional counting of cases (viewtopic.php?f=9&t=89215&p=420316), but I now realize that I need to ask another.

Suppose I have 4 columns: RACE, NATIONALITY, MAJOR and WEIGHT and each row is a student. RACE can take any value in {1,2,3}, NATIONALITY any value in {1,2,3,4,5,6,7}, MAJOR any value in {1,2,3,4,5,6,7,8,9,10} and WEIGHT is a numerical value used to weigh each person in the sample so that the results would reflect the population.

How do I, for example, count the number of rows that have RACE = 1 and (NATIONALITY = 3 or MAJOR = 5) but such that each row that satisfies this condition is counted not as 1 but as the value of WEIGHT? (See my previous question linked to above if what I mean here isn't clear.) I guess I could count the number of rows that have RACE = 1 and NATIONALITY = 3, add the number of rows that have RACE = 1 and MAJOR = 5 and subtract the number of rows that have RACE = 1 and (NATIONALITY = 3 and MAJOR = 5), but that's a pain and I'm sure there is an easier way to do that. (It's even more of a pain because some respondents in the sample may have answered the question for NATIONALITY but not MAJOR or vice versa and, since ultimately my goal is to calculate the proportion of respondents who have RACE = 1 and (NATIONALITY = 3 or MAJOR = 5) among those who have answered the questions for RACE, NATIONALITY and MAJOR, while ignoring the respondents who didn't answer all 3 questions, so I have to deal with non-response.)

In my previous question, I only needed the rows I wanted to count to satisfy a conjunction, but here there is also a disjunction in there. (Of course, as I have noted above, I know that any expression that contains a disjunction can be expressed by one that only contains conjunction and negation, but it would be a huge pain and I'm sure there is an easier way to do what I want to do.)

I hope my question isn't too confused and would really appreciate some help!

Best,
Philippe
Last edited by phl43 on Thu Jun 22, 2017 11:45 pm, edited 1 time in total.
OpenOffice 4.1.1 on MacOSX
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Conditional counting of cases with disjunction and weigh

Post by Villeroy »

https://wiki.openoffice.org/wiki/Docume ... /DataPilot (aka Pivot Table)

Are you really trying to do scientific evaluation with a spreadsheet as the only tool?
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Lupp
Volunteer
Posts: 3542
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Conditional counting of cases with disjunction and weigh

Post by Lupp »

That's not counting but adding under conditions. You may use SUMIFS() (See help!) or something like

Code: Select all

=SUMPRODUCT(R_range=1;N_range=3;M_range=5;WEIGHT_range)
(Did you already create all the combinations of R, N, and M for the evaluation?)

(Editing)
Sorry. Just found that SUMIFS is not yet included in the help. The hints given by the formula wizard should do.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Conditional counting of cases with disjunction and weigh

Post by Villeroy »

A pivot table can show the sum of weights for all combinations of R, N, M in one table without writing a single formula.
Attachments
RNMW_Pivot.ods
Pivot table from 3 random categories summing up random values
(52.99 KiB) Downloaded 238 times
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
MrProgrammer
Moderator
Posts: 4895
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Conditional counting of cases with disjunction and weigh

Post by MrProgrammer »

phl43 wrote:How do I, for example, count the number of rows that have RACE = 1 and (NATIONALITY = 3 or MAJOR = 5) but such that each row that satisfies this condition is counted not as 1 but as the value of WEIGHT?
Using De Morgan's Laws, use =SUMPRODUCT(R_range=1;NOT((N_range<>3)*(M_range<>5));W_range).

Or use =SUMPRODUCT(R_range=1;IF((N_range=3)+(M_range=5));W_range) and press ⇪⌘Enter (Shift+Command+Enter) on a Mac or Ctrl+Shift+Enter on other platforms; if you press Enter by mistake, use Edit → Undo and try again.
Lupp wrote:=SUMPRODUCT(R_range=1;N_range=3;M_range=5;W_range)
That should consider (NATIONALITY = 3 and MAJOR = 5), not (NATIONALITY = 3 or MAJOR = 5).
Villeroy wrote:A pivot table can show the sum of weights for all combinations of R, N, M in one table without writing a single formula.
Agreed. A pivot table is a better method if more than one combination of variables needs to be considered. My SUMPRODUCT solution is a direct answer to the OP's question, based on the assumption that RACE = 1 and (NATIONALITY = 3 or MAJOR = 5) is the only combination of interest.

If this solved your problem please go to your first post use the Edit button and add [Solved] to the start of the title. You can select the green checkmark icon at the same time.

[Tutorial] Ten concepts that every Calc user should know
Last edited by MrProgrammer on Tue Oct 31, 2017 4:08 am, edited 1 time in total.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
phl43
Posts: 10
Joined: Tue Jun 06, 2017 7:35 pm

Re: Conditional counting of cases with disjunction and weigh

Post by phl43 »

Thanks for the help everyone.

SUMIF won't do and the pivot table didn't work and wouldn't be practical in this case anyway because the dataset is actually a lot more complicated than the toy example I used to explain my problem.

The solution with SUMPRODUCT works, but I'd rather not use De Morgan's law and use IF instead, because it makes my formulas easier to understand. I tried with IF but it doesn't work, whereas the logically equivalent formula with NOT does. I did try to press Shift+Command+Enter (why do you say that I need to do that when using IF?) but it didn't work, I still get an error.

EDIT: To be honest, I kind of hate Calc/Excel and want to learn R to do that kind of things, but I don't have time right now and what I want to do right now is very simple, so I figured it would be faster to just learn how to do it with Calc.
OpenOffice 4.1.1 on MacOSX
RPG
Volunteer
Posts: 2250
Joined: Tue Apr 14, 2009 7:15 pm
Location: Netherlands

Re: Conditional counting of cases with disjunction and weigh

Post by RPG »

Hello

There we have not a good description of your table we can only guess.

When the pivottable is not a good solution for your problem then maybe you can solve your problem with the database functions in Calc. It is also possible you can filter the data then sum the result. In both cases you have to define a database-range of your data.

I think also a good combination of a filter, sorting and subtotals give a fast result. Be sure you have a copy of your data.

Romke
LibreOffice 7.1.4.2 on openSUSE Leap 15.2
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Conditional counting of cases with disjunction and weigh

Post by Villeroy »

If you want to learn R, you need the mind set of a programmer. If something "does not work", it always raises some kind of error or gives an unexpected result. You don't tell us anything except that it "does not work" which is a null information. You dont even paste your actual formula.

An IF formula is totally inadequate in this case.

Strictly speaking, any kind of spreadsheet is inadequate when it comes to aggrrgations of data sets. Because 90% of spreadsheet users confuse spreadsheets and databases, advanced spreadsheet programs have a tool to create pivot tables. This is a quick and easy method to generate so called "aggregation queries" which is a database term. The whole procedure has nothing to do with spreadsheets but it matches the expectations of many users.

Click any cell in your table which should have one row of column labels above the data (as in my example file)
Call Data>Pivot Table>Create...
The selection expands to the current region of adjacent non-empty cells around the current cell.
Confirm that you are going to create a pivot table from the current spreadsheet selection. Of course you could apply this database tool to database records as well.
If this is not the right selection, then cancel, select the correct rectangle of cells and start again.
In the big dialog that follows:
Drag the weight field into the "Data Fields" box.
Drag the 3 other fields into the "Row Fields" box.
Click [More Options] and specify "New Sheet" as target of the new table (best choice).
Click OK.
The result is a table with the sum of weights for each combination of row fields.
There are filter buttons and you can change the layout by means of simple drag&drop operations. Refer to the tutorial I've linked.

Feel free to play with my example file.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
phl43
Posts: 10
Joined: Tue Jun 06, 2017 7:35 pm

Re: Conditional counting of cases with disjunction and weigh

Post by phl43 »

That's okay, I have figured out how to do what I need to do, thanks everyone for your help. I actually have a Bachelor in CS and have been programming since I was 14, so I like to think I know a thing or two about programming :-) I wasn't giving you any information about the error because Calc wasn't really telling me anything, and I couldn't upload my spreadsheet because it's way too big, nor describe it exactly as it was because it actually has hundreds of variables. This is why I just reduced the problem I was trying to a solve to one that involves less variables but otherwise is essentially the same. I agree that Excel/Calc is totally inadequate for the kind of things I want to do, but I needed to do this quickly and didn't have time to learn something more complicated. I bought a book about R and I will learn it when I have a bit more time so I don't have to deal with Calc/Excel again. Thanks again to everyone for the help.
OpenOffice 4.1.1 on MacOSX
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] Conditional counting of cases with disjunction

Post by Villeroy »

I actually have a Bachelor in CS and have been programming since I was 14,
Why do you use a spreadsheet then? You could choose between better tools even if they involve some coding. Spreadsheets are for lamers or when you need very quick and dirty results within minutes.
Your first explanation describes a simple, flat, normalized table and that you don't know how to get Sum(W) for each combination of R,N,M. By the way, this is the most frequently asked question on all spreadsheet lists since the early 90ies.
Someone provided a valid solution =SUMPRODUCT(R_range=1;N_range=3;M_range=5;WEIGHT_range) which returns either the correct value or an incorrect value or some kind of error value. There can not be anything beyond that.
If you were one of these non-techies, unable to paste your actual formula or describe the incorrect result or error, you could simply to provide a small example file (mine includes 1000 rows of data) with some formula or with a pivot table demonstrating problem clearly to everyone.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
phl43
Posts: 10
Joined: Tue Jun 06, 2017 7:35 pm

Re: [Solved] Conditional counting of cases with disjunction

Post by phl43 »

Why do you use a spreadsheet then? You could choose between better tools even if they involve some coding. Spreadsheets are for lamers or when you need very quick and dirty results within minutes.
Because I have never learned R or any other language for data analysis and, since I'm in the midst of finishing my dissertation, I don't have time to do so at the moment. (The dataset I'm looking at has nothing to do with my PhD, which is a mix of math and philosophy but doesn't deal with empirical data, it's just something about the Public-Police Contact Survey I'm looking at out of personal interest.) I also never used Excel/Calc and don't know how to do even the most basic things, but figured that it would be the best way to get a quick and dirty result, which is why I asked. The SUMPRODUCT formula you mention doesn't actually give me what I want, for reasons that MrProgrammer noted, but I was able to figure out how to do what I want by using De Morgan's law. I was just hoping I would find a more natural solution, but since I just need something that works and don't really care how good it looks, it doesn't matter.
OpenOffice 4.1.1 on MacOSX
User avatar
Lupp
Volunteer
Posts: 3542
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: [Solved] Conditional counting of cases with disjunction

Post by Lupp »

phl43 wrote:The SUMPRODUCT formula you mention doesn't actually give me what I want, for reasons that MrProgrammer noted, but I was able to figure out how to do what I want by using De Morgan's law.
Yes. I have to confess that I hadn't read the question thoroughly enough when I posted my first answer. Thanks to MrProgrammer who pointed that out! As we got a hint how to fix the issue using Boolean Algebra, I would like to add that we can get a solution based on either the (reduced) disjunctive or conjunctive normal form of any Boolean function choosing the addends. Often simpler solutions will be available.

Even more I would like to answer (negative) one question that was posted in one of the comments above and one that I didn't find posted though rather obvious - or did I not read thoroughly enough a second time?

Q: For what should a SUMPRODUCT formula need to be entered explicitly for array-evaluation, probably under some conditions about the subformulas?
A: It shouldn't. All the parameters of SUMPRODUCT are specified 'ForceArraay' in the respective ODF document. Thus they should be evaluated in array-mode anyway.

Q: Why didn't someone simplify and clarify by that the above given formulas using the OR function?
A: OR is not applicable in this context because it never can return an array. Its parameters are neither specified as scalars nor as arrays, but as lists. Thus the OR function will simply unite the logical elements passed via any parameter into one list without even checking the dimensions if some of the parameters were originally arrays.

OR(TrueFalse1, TrueFalse2, ...) can be replaced by IF(TrueFalse1 + TrueFalse2 +... >= 1; TRUE(); FALSE()) if array-evaluation is needed.
On the one hand this makes use of the fact that Boolean values are treated as numbers and can be added. (In fact a serious mistake in the development of spreadsheets.)
On the other hand it would be a terrible mistake here to pass arbitrary numbers <> 0 assuming they would be interpreted as TRUE. Allowing TRUE to be represented by something else but 1 was another and even more serious mistake in the history of spreadsheets. (-1 is as TRUE as 1 is in spreadsheets!)

Concerning "subtypes" like 'Logical' (Boolean) or 'Date' ... the specifications are about as precise as bad advertising.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
User avatar
MrProgrammer
Moderator
Posts: 4895
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: [Solved] Conditional counting of cases with disjunction

Post by MrProgrammer »

I've attached a document showing some of the ways to count/sum with conjunction and disjunction. The simplest way to deal with disjunction is to use a helper column containing the OR function. If one wants to avoid a helper column, another fairly strightforward method is to use the addition operator and the SIGN function: =SUMPRODUCT(R_range=1;SIGN((N_range=3)+(M_range=5));W_range). The OP seemed to have trouble with the formula which required an array function (⇑⌘Enter on a Mac, Ctrl+Shift+Enter on other platforms). It works fine as demonstrated by cell H6, but SIGN is better because an array function isn't needed. The DSUM function is another simple method but requires an auxillary range (K1:M3) to hold the selection criteria.
201710302050.ods
(11.8 KiB) Downloaded 233 times
phl43 wrote:I did try to press Shift+Command+Enter (why do you say that I need to do that when using IF?) but it didn't work, I still get an error.
I think you pressed Enter out of habit and then did not follow my instructions which say "if you press Enter by mistake, use Edit → Undo and try again." Since it's too late for Undo, clear the cell (Backspace key) then retype the entire formula. Repeat those two steps until you remember to press Shift+Command+Enter instead of Enter. Shift+Command+Enter tells Calc to evaluate it as an array formula.
Lupp wrote:All the parameters of SUMPRODUCT are specified 'ForceArraay' in the respective ODF document. Thus they should be evaluated in array-mode anyway.
Yes, I know. But on my system the SUMPRODUCT function in A6 of the attachment produces a #VALUE! error unless I use ⇑⌘Enter. This is presumably a bug in ForceArray handling of the IF function, which has perhaps been corrected in later versions of OpenOffice. (I use 3.2).
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
Post Reply