[R] How do I read a text (.csv) file to match a matrix/cross tab? (Object confusion??)

From: Farley, Robert <FarleyR_at_metro.net>
Date: Wed, 05 Nov 2008 15:51:58 -0800


I'm having a problem reading data to set control totals for a dataframe. I want to adjust a dataframe based on a 2-d table of values, which I get by using :

> CurrentX1Sums <- as.matrix(xtabs(~tripid_nu+lineon, data=SurveyData))

> CurrentX2Sums <- apply(CurrentX1Sums, 1, sum)
 

I've created a .csv file with new (target) sums that looks like this:

tripid_nu    Warner Center     De Soto      Pierce College      Tampa
Reseda        Balboa          Woodley     Sepulveda   Van Nuys   Woodman
Valley College      Laurel Canyon          North Hollywood

9011880     5        2        2        2        2        2        2
2        2        2        6        4        1

9011890     1        1        1        1        1        1        2
1        1        1        1        2        1

9011960     2        2        2        1        2        2        1
2        3        2        2        1        1

9011970     1        1        1        1        2        1        1
2        6        2        2        2        24

9012040     2        2        2        3        2        7        2
2        2        1        1        1        1

9012050     1        1        1        1        1        1        1
2        2        2        1        1        5

  ...{More}...  

I'm trying to read/process it like this:

> NewTargetData <- read.table("C:/Data/R/NewTarget.csv", header=TRUE,
sep=",", na.strings="NA", dec=".")

> NewTargetX1Sums <- as.matrix(NewTargetData)

> NewTargetX2Sums <- apply(NewTargetX1Sums, 1, sum)
   

The structures of CurrentX1Sums and NewTargetX1Sums are different:  

> str(CurrentX1Sums)

 xtabs [1:55, 1:13] 1 0 1 0 1 0 0 0 0 1 ...

  ..$ tripid_nu: chr [1:55] "9011880" "9011890" "9011960" "9011970" ...

  ..$ lineon : chr [1:13] "Warner Center" "De Soto" "Pierce College" "Tampa" ...

> str(NewTargetX1Sums)

 int [1:55, 1:14] 9011880 9011890 9011960 9011970 9012040 9012050 9012130 9012280 9012290 9012720 ...

  ..$ : NULL   ..$ : chr [1:14] "tripid_nu" "Warner.Center" "De.Soto" "Pierce.College" ...

>
       

Question 1) The structures (CurrentX1Sums , NewTargetX1Sums) are different. One way is in the dimension of the rownames. Instead of line numbers, I want tripid_nu. How do I do that? What's the appropriate "structure" for both?    

Question 2) Why do the labels in NewTargetData have dots in place of spaces? Will that be a problem later when I try to match them with SurveyData?    

Question 3) Ultimately, I want to create a variable in the original dataframe like:

           SurveyData$NewX1 = TargetX1Sums/ CurrentX1Sums { for each tripid_nu, lineon combination}

Am I on the right track to do so? Any hints on what THAT syntax will look like?        

Thanks in advance,          

########################################################################
######################################

#My work to date:

> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav",
use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)

> NewTargetData <- read.table("C:/Data/R/NewTarget.csv", header=TRUE,
sep=",", na.strings="NA", dec=".")

>

#-----------------------------------------------------------------------

> temp <- sub(' +$', '', SurveyData$direction_) # Remove spaces
from variable names

> SurveyData$direction_ <- temp

>

#-----------------------------------------------------------------------

> SurveyData$StnNum=as.numeric(SurveyData$lineon)

> CurrentX1Sums <- as.matrix(xtabs(~tripid_nu+lineon, data=SurveyData))

> CurrentX2Sums <- apply(CurrentX1Sums, 1, sum)

> NewTargetX1Sums <- as.matrix(NewTargetData)

> NewTargetX2Sums <- apply(NewTargetX1Sums, 1, sum)

>

> str(CurrentX1Sums)

 xtabs [1:55, 1:13] 1 0 1 0 1 0 0 0 0 1 ...

  ..$ tripid_nu: chr [1:55] "9011880" "9011890" "9011960" "9011970" ...

  ..$ lineon : chr [1:13] "Warner Center" "De Soto" "Pierce College" "Tampa" ...

> str(NewTargetX1Sums)

 int [1:55, 1:14] 9011880 9011890 9011960 9011970 9012040 9012050 9012130 9012280 9012290 9012720 ...

  ..$ : NULL   ..$ : chr [1:14] "tripid_nu" "Warner.Center" "De.Soto" "Pierce.College" ...

>

> CurrentX1Sums

         lineon

tripid_nu Warner Center De Soto Pierce College Tampa Reseda Balboa Woodley Sepulveda Van Nuys Woodman Valley College Laurel Canyon North Hollywood

  9011880             1       0              2     1      0      2
1         0        0       0              1             0
0
  9011890             0       0              0     0      0      0
1         0        0       0              0             1
0
  9011960             1       1              2     0      1      1
0         1        3       2              1             0
0
  9011970             0       0              0     0      1      0
0         1        6       1              1             1
14

 ...{More}...

> NewTargetX1Sums

      tripid_nu Warner.Center De.Soto Pierce.College Tampa Reseda Balboa Woodley Sepulveda Van.Nuys Woodman Valley.College Laurel.Canyon North.Hollywood

 [1,]   9011880             5       2              2     2      2      2
2         2        2       2              6             4
1
 [2,]   9011890             1       1              1     1      1      1
2         1        1       1              1             2
1
 [3,]   9011960             2       2              2     1      2      2
1         2        3       2              2             1
1
 [4,]   9011970             1       1              1     1      2      1
1         2        6       2              2             2
24

...{More}...

########################################################################
######################################









Robert Farley

Metro

1 Gateway Plaza

Mail Stop 99-23-7

Los Angeles, CA 90012-2952

Voice: (213)922-2532

Fax: (213)922-2868

www.Metro.net

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 05 Nov 2008 - 23:56:15 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 06 Nov 2008 - 01:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive