From: Greg Snow <greg.snow_at_ihc.com>

Date: Thu 24 Mar 2005 - 04:52:57 EST

>> Dear R-users,

*>>
*

*>> I have an outcome variable and I'm unsure about how to treat it.
*

Any

>> advice?

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Mar 24 05:09:18 2005

Date: Thu 24 Mar 2005 - 04:52:57 EST

>> >>> "Jason W. Martinez" <jmartinez5@verizon.net> 03/22/05 04:11PM

>>>

>> Dear R-users,

Any

>> advice?

Below are a couple of ideas/suggestions of things to think about

*>>
*

>> I have spending data for each county in the state of California

(N=58).

>> Each county has been allocated money to spend on any one of the

*>> following four categories: A, B, C, and D.
**>>
**>> Each county may spend the money in any way they see fit. This also
*

means

>> that the county need not spend all the money that was allocated to

them.

>> The data structure looks something like the one below:

You might want to include a category for the amout of money not spent
(for

a total of 5 possibilities).

>> COUNTY A B C D Total

*>> ----------------------------------------------------
**>> alameda 2534221 1555592 2835475 3063249 9988537
**>> alpine 3174 8500 0 45558 55232
**>> amador 0 0 0 0 0
**>> ....
**>>
**>>
**>> The goal is to explain variation in spending patterns, which are
**>> presumably the result of characteristics for each county.
*

Do you have data representing these characteristics? The predictor
values

in a regression type model?

Starting with some good graphics may help determine and show interesting patterns.

The maptools package can read in shapefiles and plot the maps. You can

download a shapefile with the county boundaries from: http://www.census.gov/geo/www/cob/co2000.html

Then you could use the symbols function to plot a star in the center of
each

county (use get.Pcent from maptools to find the coordinates of the
centers).

Then just look for groups of counties with similar looking stars, or
stars that

are different from those close by (I would use the percentage spent in
each

category for the lengths of the star spokes).

Another graph that may prove interesting is the trilinear plot (see the
article

in Chance from the summer of 2002). Combine your categories into 3
groups

(e.g. A&B vs. C&D vs. not spent; or A vs. B vs. all others) then plot
each county's

spending on the trilinear plot (functions to do the plot are:
triangle.plot in ade4,

triplot in klaR, or I have some code that I wrote (not on CRAN yet)).

Look for clusters of counties in these plots.

>> I may treat the problem like a simple linear regression problem for

each

>> category, but by definition, money spent in one category will take

away

>> the amount of money that can be spent in any other category---and

each

>> county is not allocated the same amount of money to spend.

*>>
**>> I have constructed proportions of amount spent on each category and
*

have

>> conducted quasibinomial regression, on each dependent outcome but

that

>> does not seem very convincing to me.

*>>
**>> Would anyone have any advice about how to treat an outcome variable
*

of

>> this sort?

Here are a couple of thoughts (there may be better options).

Assuming that you have some predictor (x) variables about each county:

use the multinom function in the nnet package, the idea being that each

dollar spent follows a multinomial with certain probabilities as to
which category

it will be spent in and the predictors tell you what the probabilities
are.

Similarly you could use package rpart to do a tree model, use the
category as the

outcome and the percentage spent on the category as the weights (each
county

would be spread accross 4 or 5 lines of the dataset with the predictors
replicated

on each line). rpart gives the probabilities/proportions for each
category based

on splits of the predictor variables.

>> Thanks for any hints!

*>>
**>> Jason
**>>
**>>
**>> --
**>> Jason W. Martinez, Gradaute Student
**>> University of California, Riverside
**>> Department of Sociology
**>> E-mail: jmartinez5@verizon.net
**>>
*

hope this helps,

Greg Snow, Ph.D.

Statistical Data Center

greg.snow@ihc.com

(801) 408-8111

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Mar 24 05:09:18 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:30:55 EST
*