[R] simple intro to cluster analysis using R

From: Donatas G. <dgvirtual_at_akl.lt>
Date: Thu, 10 Apr 2008 02:00:57 +0300


I am looking for simple introduction to cluster analysis using R, that would be understandable to a novice in statistics. Or, could someone perhaps help me understand how to proceed in my analysis? I am very new to both statistics and R, but am trying hard to avoid having to use SPSS as everyone around me...

I have dataset on people presenting their opinions on different religious communities coded on 5 point scale, and I want to see if those communities can be grouped (clustered) in some way that would be illuminatin for my research purposes.

So, I have data that looks like this:

> describe(R12)

R12

 18 Variables 1035 Observations



R12.1
      n missing  unique
    416     619       5

More negative (51, 12%), More positive (112, 27%) Completely negative (41, 10%), Completely positive (23, 6%) Neutral (189, 45%)

<skip>

R12.12

      n missing  unique
    451     584       5

More negative (111, 25%), More positive (43, 10%) Completely negative (79, 18%), Completely positive (5, 1%) Neutral (213, 47%)

<and so on>

So you can see there is a lot (more than half) at times NA's in this questionnairre.

Here is also a correlation matrix (only part is displayed):

> x=cor(R12, use="pairwise.complete.obs")
> round(x,2)

       R12.1 R12.2 R12.3 R12.4 R12.5 R12.6 R12.7 R12.8 R12.9 R12.10 R12.11
R12.1   1.00  0.57  0.57  0.61  0.57  0.48  0.43  0.38  0.52   0.58   0.58
R12.2   0.57  1.00  0.82  0.78  0.73  0.62  0.43  0.49  0.64   0.69   0.75
R12.3   0.57  0.82  1.00  0.89  0.90  0.73  0.54  0.57  0.70   0.77   0.78
R12.4   0.61  0.78  0.89  1.00  0.91  0.68  0.51  0.56  0.65   0.80   0.76
R12.5   0.57  0.73  0.90  0.91  1.00  0.73  0.53  0.55  0.68   0.78   0.74
R12.6   0.48  0.62  0.73  0.68  0.73  1.00  0.59  0.62  0.68   0.79   0.78
R12.7   0.43  0.43  0.54  0.51  0.53  0.59  1.00  0.62  0.55   0.65   0.65
R12.8   0.38  0.49  0.57  0.56  0.55  0.62  0.62  1.00  0.55   0.65   0.62
R12.9   0.52  0.64  0.70  0.65  0.68  0.68  0.55  0.55  1.00   0.79   0.82
R12.10  0.58  0.69  0.77  0.80  0.78  0.79  0.65  0.65  0.79   1.00   0.88
R12.11  0.58  0.75  0.78  0.76  0.74  0.78  0.65  0.62  0.82   0.88   1.00
R12.12  0.47  0.59  0.64  0.65  0.60  0.61  0.56  0.50  0.68   0.77   0.83
R12.13  0.62  0.69  0.77  0.70  0.74  0.76  0.65  0.61  0.78   0.81   0.82
R12.14  0.58  0.70  0.71  0.75  0.70  0.74  0.64  0.62  0.78   0.86   0.86
R12.15  0.58  0.61  0.72  0.72  0.71  0.72  0.64  0.59  0.73   0.83   0.79
R12.16  0.56  0.67  0.77  0.72  0.78  0.75  0.57  0.54  0.75   0.85   0.80
R12.17  0.61  0.69  0.79  0.77  0.75  0.73  0.56  0.57  0.74   0.82   0.80
R12.18  0.63  0.73  0.84  0.82  0.83  0.71  0.54  0.64  0.68   0.71   0.74

so you can see there is a lot of correlation in the opinions. I doubt clusterization would be meaningfull, but I still want to try.

How do I proceed with this?

-- 
Donatas Glodenis

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 09 Apr 2008 - 23:04:12 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 10 Apr 2008 - 05:30:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive