## [Solved] Weighting Average by # Responses

### [Solved] Weighting Average by # Responses

Greetings All,

Thanks to the community regarding a question about weighted averages a week or so ago [SOLVED]. I have another weighted average question (I think) and thought I'd try again with a different challenge.

I work for a nonprofit foundation and we are trying to collate the responses to some survey questions regarding a public transportation utility in South America. If you look at the attached simulation (not the real data) you see a series of 4 criteria that the respondees were to rate and then assess how important it is. So the respondee scores a Criteria on a scale of 1-5 (5 is the best, 1 the worst) then says how important that criteria is to them again on a scale (5 most important, 1 least). So for example one criteria might be, "Do the buses run on time?" and then, "How important is that to you?"

Here is the problem: If you take a straight average of the 4 scores to come up with a cumulative "grade" (3.5), you give all these factors equal weight in the average. You'll notice for example Criteria A was rated the lowest (2.1) but yielded the most responses and received the most votes for being "highly important". On the other hand Criteria B was graded the highest but was the least important and received the second most total responses.

We want to scale or weight the average so that "A" is given more significance that "B". It would be great to average it weighted by the # of "Importance" rankings but we'd settle for an average based on each criteria's share of the total number of responses perhaps.

Unfortunately, we don't have access to the underlying data because it was gathered haphazardly - if we could enter the data ballot by ballot it wouldn't be hard to do this. All we have is these aggregates, a gross average by criteria and then # of "votes" on importance.

As always, your thoughts and suggestions are greatly appreciated!
Marcelo
Last edited by marcelovela on Sat Jan 20, 2018 6:58 pm, edited 1 time in total.
Marcelo
OpenOffice 4.1.3 on Windows 10
marcelovela

Posts: 13
Joined: Tue Jan 09, 2018 10:09 pm

### Re: Weighting Average by # Responses

It would be great to average it weighted by the # of "Importance" rankings but we'd settle for an average based on each criteria's share of the total number of responses perhaps.

How is weighting by "the # of Importance rankings" different than "an average based on each criteria's share of the total number of responses". I see that Criteria A has 423 responses. Is its "# of Importance rankings" different than that?

Does weighting as described above mean ignoring the values in column B?

Taking a step back, treating rankings like this as if they represent numeric data is not legitimate. One can't really describe the distance between rankings, which is an assumption behind averaging the rankings. Is the "distance" between Most Important and Somewhat Important the same as the "distance" from Neither Important or Unimportant to Not Very Important? More explanation can be found on this blog. That is a web site for a book but there is also a link to a journal article that is free. It might be more technical than you want.
Windows 10 and Linux Mint, since 2017
FJCC
Moderator

Posts: 7259
Joined: Sat Nov 08, 2008 8:08 pm

### Re: Weighting Average by # Responses

FJCC, thanks for your post - sorry if it wasn't clear.

I took a look at the post you suggested and it is interesting to consider these points, which appears aimed at the validity of such averaging processes. It's all way above my pay grade (and understanding), but I must admit to never thinking through that part of the process.

In our example here, we are more focused on an analysis of someone else's data and to answer your question, let's assume the distance between the "Importance" rankings to be equal.

To answer another question, I didn't explain clearly what the difference was between the Importance rankings and each criteria's share of the total number of responses... here goes...

What I (think) am trying to achieve is to achieve a weighted average of all of the scores in column B (B3:B6). Right now if you average these 4 scores absent any consideration of the # of responses - or - the importance of the area to end users, then each score contributes 25% to the average, yielding 3.5 (B8). However Criteria A is responsible for 34.4% of all the scores provided (H3). Criteria A is also ranked as the most important factor. For example, Criteria A generates the greatest number in both Column C and Column D. So its relative importance to the overall score should somehow reflect that.

In this business case, a public transportation utility is congratulating itself (3.5 overall ranking out of 5, "hey we're not so bad!") when most of the oomph behind that score comes from doing "OK" in areas that are not all that important as seen in Criteria B & C (e.g., the bus driver always says "Hello" vs the buses don't run on time - the latter being "more important" to more people as seen in the scores in Criteria A).

In the areas that matter to the most people, the utility is not doing very well at all and that 3.5 in B8 should reflect that, it should be much lower.

So, one solution idea is to simply weight the average of all of the scores, as noted in cell B8 (3.5) so that Criteria A's score contributes 34.4% to that average (H3/H7). I'm stuck on average here, but maybe it's a different function? This ignores all the data in columns C thru G.

A second, more complex solution idea is to do the above but also add weighting for the importance scores in columns C thru G. For example, I was playing around with weighting a value of "5" to each of the scores in B3:B6; "4" to each of the scores in C3:C6; and so on over to column G at "1". Then, weighting these results against the averages in Column B to come up with a more parsed score in B8. The scores in B3:B6 remain unchanged throughout of course. It's great if the bus driver says hello but some work needs to be done to get the buses running on time.

Thanks for your thoughts and input!
Marcelo
OpenOffice 4.1.3 on Windows 10
marcelovela

Posts: 13
Joined: Tue Jan 09, 2018 10:09 pm

### Re: Weighting Average by # Responses

My apologies to FJCC, when I said:

"FJCC, thanks for your post - sorry if it wasn't clear."

What I meant was I'm sorry if MY post wasn't clear. FJCC's reply was perfectly understandable.
Marcelo
OpenOffice 4.1.3 on Windows 10
marcelovela

Posts: 13
Joined: Tue Jan 09, 2018 10:09 pm

### Re: Weighting Average by # Responses

Thanks for the clear explanation. It really helps when people take the time to lay out the problem.
The attached file shows two ways to weight the average. Neither fixes the basic illegitimacy of such calculations but they do lower the reported number.
Attachments
WeightedAvg.ods
Windows 10 and Linux Mint, since 2017
FJCC
Moderator

Posts: 7259
Joined: Sat Nov 08, 2008 8:08 pm

### Re: Weighting Average by # Responses

FJCC's solution was simple and elegant, our thanks and appreciation for thinking it through with us. We're marking this case [SOLVED].
Marcelo
OpenOffice 4.1.3 on Windows 10
marcelovela

Posts: 13
Joined: Tue Jan 09, 2018 10:09 pm

### Re: Weighting Average by # Responses

FJCC wrote:Thanks for the clear explanation. It really helps when people take the time to lay out the problem. The attached file shows two ways to weight the average. Neither fixes the basic illegitimacy of such calculations but they do lower the reported number.

I am not sure that I completely follow the solution rationales.

I would have thought the (probably) correct approach would be to the weighted mean for each row, for so for Category A we have

saa = (231*5, 143 * , 45* , 3*2, 1*1) and

san = sum(231, 143, 45, 3, 1)

Weighted mean for A = saa/san

Repeat for Categories B, C, D.

Then for the Grand Mean we take the we take the four sums saa , sbb, scc, sdd and divide by san, + sbn, etc. That is the sum of H2:H5 in your example.

I suspect that the terminology the OP is using is partly to blame. It looks to me that the 5-point scale should be considered a rating scale not a ranking scale. In the social sciences this usually allows one to make the (slightly dubious) assumption of interval data.

Sorry for not supplying a working example but what calculations I did do I did it another program. I do think that the weighted means for each category should be:
A : 4.41844
B : 2.1875
C : 2.914634
D : 3.963277

It seems very strange that Calc does not offer a weighted.mean function.
OpenOffice 4.1.6; Ubuntu 18.04
jrkrideau
Volunteer

Posts: 3690
Joined: Sun Dec 30, 2007 10:00 pm