**From:** A.J. Rossini (*rossini@blindglobe.net*)

**Date:** Fri 09 May 2003 - 12:27:46 EST

**Next message:**Adaikalavan Ramasamy: "RE: [R] Data-mining using R"**Previous message:**Andy Jacobson: "[R] graphics on a map"**In reply to:**Fernando Henrique Ferraz Pereira da Rosa: "[R] Data-mining using R"**Next in thread:**Adaikalavan Ramasamy: "RE: [R] Data-mining using R"

Message-id: <87u1c4zwf1.fsf@jeeves.blindglobe.net>

See www.bioconductor.org for one reasonably full featured approach.

There are others (Rmaanova, etc, etc).

Fernando Henrique Ferraz Pereira da Rosa <mentus@gmx.de> writes:

*> Is it possible to use R as a data-mining tool? Here's the problem I've
*

*> got. I have a couple of data sets consisting of results from a cDNA
*

*> microarray experiment - the details about the biology don't really matter here, the
*

*> same theory applies for any other data-mining task (that's why I thought it'd
*

*> be more appropriate to post this on r-user). Each of these datasets consists
*

*> of about 30000 rows by 20 to 30 columns. Let's say that each row represents
*

*> (very roughly speaking) a gene, and the columns are details about its level
*

*> of expression, reliability of the measurament, coordinates and so on.
*

*> The main objetive here is identify some genes (rows) according to some
*

*> criteria. In order to do that, what I want to be able to do, is selectively
*

*> filter the rows, graph some convinient variables, do some further filtering
*

*> and so on.
*

*> Let me take a more concrete example to make myself clear. Let's say
*

*> that I load a given dataset on a dataframe, namely expr1. This dataframe would
*

*> have the fields expr1$name, expr1$expression, expr1$reliablity, expr1$x,
*

*> expr1$y and so on, containing, for instance, 26000 rows. Now from these 26000 I'd
*

*> like to select only those ones satisfying expr1$expression > 2000,
*

*> expr1$reliability = 100 and plot a graph on expr1$x x expr1$y, for them. I'd have then
*

*> a reduced dataset of the first one. Let's say now that I want to narrow my
*

*> filter even more, selecting only (among the ones I have already selected) the
*

*> ones where expr1$x > 20.
*

*> This would be done many times and in different orders. I'd like to be
*

*> able to, among those 26000 rows, take only the 100 whose expr$x are the 100
*

*> greatest
*

*> . And so on, many times, until I found a set of suitable rows.
*

*> What is the proper way to do that using R, if any? I've played a
*

*> little with dataframes (I could for instance use: expr1$names[expr1$x > 20] to get
*

*> the names of those genes whose x > 20) but it seemed a little clumsy. Should
*

*> I keep trying to manipulate directly the dataframe, or perhaps should I save
*

*> it on a mysql database and do que queries using RMYSql? Or maybe there is a
*

*> better option?
*

*> I know that these things I've said are pretty easy to implement using,
*

*> for instance M$ Excel (I've seen them working on it). You just select
*

*> drop-down menus and filter the rows to your liking. But I really would like to be
*

*> able to accomplish this task using R and other open source tools like MySql,
*

*> Perl, etc.
*

*>
*

*>
*

*> Thank you in advance,
*

*>
*

*> --
*

*>
*

*> ______________________________________________
*

*> R-help@stat.math.ethz.ch mailing list
*

*> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
*

*>
*

-- A.J. Rossini rossini@u.washington.edu http://software.biostat.washington.edu/ Biostatistics, U Washington and Fred Hutchinson Cancer Research CenterFHCRC:Tu: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW : Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX

CONFIDENTIALITY NOTICE: This e-mail message and any attachments ... {{dropped}}

______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help

**Next message:**Adaikalavan Ramasamy: "RE: [R] Data-mining using R"**Previous message:**Andy Jacobson: "[R] graphics on a map"**In reply to:**Fernando Henrique Ferraz Pereira da Rosa: "[R] Data-mining using R"**Next in thread:**Adaikalavan Ramasamy: "RE: [R] Data-mining using R"

*
This archive was generated by hypermail 2.1.3
: Tue 01 Jul 2003 - 09:11:47 EST
*