[R] What to do with this data?

From: mika03 <carlmika_at_yahoo.de>
Date: Thu, 03 Apr 2008 10:45:18 -0700 (PDT)

Hello,

This is not necessarily a question about R, but more about how we should display our data in general. (Will we then use R to do that, once we know what to do ;-) I received good replies about such things in the past on this mailing list so I give it a go.

Here's what we did:
We showed a fairly large number of subjects search engine queries and different possible search engine responses. We assumed that users would like some our responses better than others and wanted to check this. Subjects could rate a query/response pair on a scale from 0 (very bad response) to 10 (very good response).

Here are all the judgments we received for one particular class of response to queries which we thought users would like:

Predicted-Good-0, 4 
Predicted-Good-1, 1 
Predicted-Good-2, 11 
Predicted-Good-3, 8 
Predicted-Good-4, 25 
Predicted-Good-5, 12 
Predicted-Good-6, 21 
Predicted-Good-7, 25 
Predicted-Good-8, 30
Predicted-Good-9, 52 
Predicted-Good-10, 189

And here are all the judgments we received for one particular class of response to queries which we thought users would NOT like:

Predicted-Bad-0, 34 
Predicted-Bad-1, 23 
Predicted-Bad-2, 45 
Predicted-Bad-3, 60 
Predicted-Bad-4, 42 
Predicted-Bad-5, 50
Predicted-Bad-6, 21
Predicted-Bad-7, 20 
Predicted-Bad-8, 25 
Predicted-Bad-9, 19 
Predicted-Bad-10, 39 

Here's a small table listing number of observations, mean, standard deviation and standard error:

Type, N, Mean, StDev, StErr
Predicted-Good, 378, 8.21693121693122, 2.47110906286224, 0.12710013550711 Predicted-Bad, 378, 4.5978835978836, 3.02059872953413, 0.155362834286119

The question we have are:

  1. It doesn't seem like our data follows a standard distribution. Therefore is it okay to calculate mean, standard deviation and standard error at all?
  2. We initially created a figure plotting the mean and a bar around it indicating standard deviation. Then somebody who knows more about statistics told us we should display the mean and error bars around it "to depict a 95% Confidence Interval, mean +/- 1.96*SE". But if we are doing this, aren't we forgetting to mention vital parts of our data, that is that we indeed get better means for "Good" responses, but that the individual data points are all over the place (especially for "Predicted-Bad")? We would capture this by showing standard deviation.
  3. And finally: What would be the best way to present this data anyway?

Thanks a lot!

-- 
View this message in context: http://www.nabble.com/What-to-do-with-this-data--tp16467948p16467948.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 03 Apr 2008 - 18:04:49 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 04 Apr 2008 - 12:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive