Re: [R] data mining/text mining?

From: Weiwei Shi <helprhelp_at_gmail.com>
Date: Fri, 08 Jun 2007 11:12:27 -0400

Dear Ruixin:
Among others, text mining is dealing with non-structural data while data mining mainly focuses on structural one. Many algorithms can be shared b/w them; however, some necessary data preprocessing is required for text mining. There are a lot of online-resource there.

As to packages used for text mining in R, esp. for preprocessing, please check the following link:
http://wwwpeople.unil.ch/jean-pierre.mueller/

I used that package very long time ago and am not sure if they are updated for this current version of R; otherwise, you might need to go back the old version like R1.1.

If you want to do text mining for chinese text (I guess :), there is additional work (i.e. word splitting) needed. I remember there is some researcher from Taiwan who does pretty good work and you can google that. I cannot remember the details.

HTH, Weiwei

On 6/8/07, Ruixin ZHU <rxzhu_at_scbit.org> wrote:
> Dear R-user,
>
> Could anybody tell me of the key difference between data mining and text
> mining?
> Please make a list for packages about data/text mining.
> And give me an example of text mining with R (any relating materials
> will be highly appreciated), because a vignette written by Ingo Feinerer
> seems too concise for me.
>
> Thanks
> _____________________________________________
> Dr.Ruixin ZHU
> Shanghai Center for Bioinformation Technology
> rxzhu_at_scbit.org
> zhurx_at_mail.sioc.ac.cn
> 86-21-13040647832
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 08 Jun 2007 - 15:44:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 08 Jun 2007 - 16:31:36 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.