Re: [R] Retrieving the 2 row of "dist" computations

From: Jeff08 <jefferyding_at_gmail.com>
Date: Thu, 10 Jun 2010 22:13:55 -0700 (PDT)

Hey Jorge,

Don't know if you received this but, essentially I found something weird with the outputs.

Edit:
There is something funky about the code. It definitely returns the right column of the "distance" data, but returns an incorrect row.

Code:

NCols=250
NRows=829
myMat<-matrix(runif(NCols*NRows), ncol=NCols)

d<-dist(myMat)
e<-sort.list(d)
e<-e[1:5]  ##Retrieve minimum 5 distances

k <- 5
res <- matrix(NA, ncol = 2, nrow = k)

ds <- sort(d)
for(i in 1:k) res[i, ] <- which(as.matrix(d) == ds[i], arr.ind = TRUE)[1,]
colnames(res) <- c('row','col')
rownames(res) <- 1:k

res

I have derived the formula for 829 rows, to check if the returned column and row matches the index given by e.

Column # = x, Row # = y. n = 828-(x-2)
index = y+(n+828)(828-n+1)/2

*
Formula R CODE*
##Just checking for row 1

i<-1
y<-res[i,1]
x<-res[i,2]
n<-(828-(x-2))

index1<-(y+(n+828)*(828-n+1)/2)
index2<-e[i]
##index1 should equal index2, but this is not the case
##you can tell that the column is right because index1 & index 2 is close
##(a change in row of 1 shifts the index by 1, but a change in column
## shifts index by ~400 on average)

You can then compare this index to the one given by e[i]

On Fri, Jun 11, 2010 at 11:06 AM, Jeff08 [via R] < ml-node+2251244-1652160471-274944_at_n4.nabble.com<ml-node%2B2251244-1652160471-274944_at_n4.nabble.com>
> wrote:

> Edit:
>
> There is something funky about the code. It definitely returns the right
> column of the "distance" data, but returns an incorrect row.
>
> Code:
>
> NCols=250
> NRows=829
> myMat<-matrix(runif(NCols*NRows), ncol=NCols)
>
> d<-dist(myMat)
> e<-sort.list(d)
> e<-e[1:5] ##Retrieve minimum 5 distances
>
> k <- 5
> res <- matrix(NA, ncol = 2, nrow = k)
> ds <- sort(d)
> for(i in 1:k) res[i, ] <- which(as.matrix(d) == ds[i], arr.ind = TRUE)[1,]
> colnames(res) <- c('row','col')
> rownames(res) <- 1:k
> res
>
> I have derived the formula for 829 rows, to check if the returned column
> and row matches the index given by e.
>
> Column # = x, Row # = y. n = 828-(x-2)
> index = y+(n+828)(828-n+1)/2
>
> *
> Formula R CODE*
> ##Just checking for row 1
> i<-1
> y<-res[i,1]
> x<-res[i,2]
> n<-(828-(x-2))
> index1<-(y+(n+828)*(828-n+1)/2)
> index2<-e[i]
> ##index1 should equal index2, but this is not the case
> ##you can tell that the column is right because index1 & index 2 is close
> ##(a change in row of 1 shifts the index by 1, but a change in column
> ## shifts index by ~400 on average)
>
> You can then compare this index to the one given by e[i]
>
>
> Jorge Ivan Velez wrote:
> Hi there,

>
> I am sure there is a better way to do it, but here is a suggestion:
>
> res <- matrix(NA, ncol = 2, nrow = 5)
> for(i in 1:5) res[i, ] <- which(as.matrix(d) == sort(d)[i], arr.ind =
> TRUE)[1,]
> res
>
> HTH,
> Jorge
>
>
> On Wed, Jun 9, 2010 at 11:30 PM, Jeff08 <> wrote:
>
> >
> > Dear R Gurus,
> >
> > As you probably know, dist calculates the distance between every two rows
>
> > of
> > data. What I am interested in is the actual two rows that have the least
> > distance between them, rather than the numerical value of the distance
> > itself.
> >
> > For example, If the minimum distance in the following sample run is
> d[14],
> > which is .3826119, and the rows are 4 & 6. I need to find a generic way
> to
> > retrieve these rows, for a generic matrix of NRows (in this example
> > NRows=7)
> >
> > NCols=5
> > NRows=7
> > myMat<-matrix(runif(NCols*NRows), ncol=NCols)
> >
> > d<-dist(myMat)
> >
> > 1 2 3 4 5 6
> > 2 0.7202138
> > 3 0.7866527 0.9052319
> > 4 0.6105235 1.0754259 0.8897555
> > 5 0.5032729 1.0789359 0.9756421 0.4167131
> > 6 0.6007685 0.6949224 0.3826119 0.7590029 0.7994574
> > 7 0.9751200 1.2218754 1.0547197 0.5681905 0.7795579 0.8291303
> >
> > e<-sort.list(d)
> > e<-e[1:5] ##Retrieve minimum 5 distances
> >
> > [1] 14 16 4 18 5
> > --
> > View this message in context:
> >
> http://r.789695.n4.nabble.com/Retrieving-the-2-row-of-dist-computations-tp2249844p2249844.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> ------------------------------
> View message @
> http://r.789695.n4.nabble.com/Retrieving-the-2-row-of-dist-computations-tp2249844p2251244.html
> To unsubscribe from Re: Retrieving the 2 row of "dist" computations, click
> here< (link removed) ==>.
>
>
>

-- 
Jeffery Ding
Duke University, Class of 2012
(224) 622-3398 | jd116_at_duke.edu

-- 
View this message in context: http://r.789695.n4.nabble.com/Retrieving-the-2-row-of-dist-computations-tp2249844p2251282.html
Sent from the R help mailing list archive at Nabble.com.

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 11 Jun 2010 - 05:39:13 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 11 Jun 2010 - 05:50:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive