Re: [R] Calculating Betweenness - Efficiency problem

From: Senthil Purushothaman <spurushothaman_at_lnxresearch.com>
Date: Thu, 24 Jul 2008 10:51:19 -0700

Dear Gabor,

      I am really sorry about the file attachment. As you might have figured out I am quite new to the forum interaction techniques. I will keep your suggestion in mind. Thanks for taking the time to test the data I sent you. I found out where exactly the problem is. The surprising part is you should have had the same issue unless you were able to handle it. Many rows in the data sheet I had sent you have funky characters and when R reads them to create a graph it fails at that particular row. All the data just before that row with that unidentified character is used to create the graph.

There are two issues at this point in time.

  1. How were you able to pull in all the information with the above mentioned problem being still in place?
  2. I have the number-number equivalent of the same data set and I used it to test if the betweenness works. I know that there are a total of 50251 nodes in the sheet. This time given that it is all numbers, the read.csv goes through successfully. I then use graph.data.frame to draw the graph and the summary information was startling. There were only 50245 vertices. I re-ran all my numbers and still they did not tally. A sample set of the data that I input looks like this.
1-4455
1-34545
2-4657
....

...
...
50251-87
50251-11

I have no idea how 50251 nodes from the data sheet shrinks to 50245 nodes in the R graph. I tried creating the vectors (earlier methodology which takes a lot of copy/paste time) using c(....) and then drew a graph from that and the number of nodes show up to be 50251 which indicates that there are 50251 vertices.

I would really appreciate if you can take a look at this issue since I am not sure if this is a data input issue or igraph issue. I will send you the number-number information sheet in a separate email.

Thank you very much. I respect your time and effort in helping me out resolve this interesting challenge.

Best regards,
Senthil
(909) 267-0799

-----Original Message-----
From: Gabor Csardi [mailto:csardi_at_rmki.kfki.hu] Sent: Wed 7/23/2008 3:43 AM
To: Senthil Purushothaman
Cc: jim holtman; r-help_at_r-project.org
Subject: Re: [R] Calculating Betweenness - Efficiency problem  

Senthil,

sending a 12Mb file to the list is not a good idea. I've run the code in my previous email without any problem, so you need to be a bit more specific about what went wrong for you.

This is what I get:

> library(igraph)
> tab <- read.csv("/tmp/Test.csv")
> dim(tab)
[1] 304711 2
> length(unique(tab))
[1] 2
> g <- graph.data.frame(tab)
> summary(g)

Vertices: 48072
Edges: 304711
Directed: TRUE
No graph attributes.
Vertex attributes: name.
No edge attributes.
> system.time(bet <- betweenness(g))

   user system elapsed
661.180 0.098 661.716
> length(bet)

[1] 48072
> bet <- data.frame(city=V(g)$name, betweenness=bet)
> dim(bet)

[1] 48072 2

Best,
Gabor

On Tue, Jul 22, 2008 at 11:58:37AM -0700, Senthil Purushothaman wrote:
> Dear Gabor,
> Thank you very much for the insights. I have been using the igraph
> package for my computations. But I did not know about
> graph.data.frame(). Thanks again for that. So I did run my data using
> the steps you had provided. Weirdly, even though the .csv file has
> approximately 300,000 records (remember that the file gets truncated to
> 65536 rows when opened in Excel 2003), not all of them are pulled in
> during the operation and the final betweenness list contains only ~1000+
> records but it should be tens of thousands.
>
> I know that you are a busy person. This problem seems to be a very
> different challenge. I am attaching the Test.csv file for your
> experiments. Thank you very much again.
>
> Best regards,
> Senthil
> (909) 267-0799
>
> -----Original Message-----
> From: Gabor Csardi [mailto:csardi_at_rmki.kfki.hu]
> Sent: Monday, July 21, 2008 1:57 AM
> To: Senthil Purushothaman
> Cc: jim holtman; r-help_at_r-project.org
> Subject: Re: [R] Calculating Betweenness - Efficiency problem
>
> Senthil,
>
> you can try the 'igraph' package. Export your two-column Excel file
> as a .csv, use 'read.csv' to read that into R, then 'graph.data.frame'
> to create an igraph graph from it. Finally, call 'betweenness' on
> the graph. It is really just three/four lines, something like this:
>
> tab <- read.csv(...)
> g <- graph.data.frame(tab)
> bet <- betweenness(g)
> bet <- data.frame(city=V(g)$name, betweenness=bet)
>
> The last line creates a two column data frame with the betweenness
> score of each city.
>
> Best,
> Gabor
[...]

-- 
Csardi Gabor <csardi_at_rmki.kfki.hu>    UNIL DGM


	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 24 Jul 2008 - 18:12:44 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 24 Jul 2008 - 18:32:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive