Re: [R] Help to check data before putting it in a database

From: Joshua Wiley <>
Date: Tue, 05 Apr 2011 08:21:41 -0700

Hi Ulisses,

Look at the functions ?match and ?rbind

If you do not want to do it by hand, you can make a little function as below.

HTH, Josh

d1 <- data.frame(goals = 4:1, players = LETTERS[1:4]) d2 <- data.frame(goals = c(1, 3, 2, 5), players = LETTERS[3:6])

f <- function(old, new, check) {
  index <- new[, check] %in% old[, check]   dat <- rbind(old, new[index, ])
  tocheck <- new[!index, ]
  list(merged = dat, tocheck = tocheck)

dmerged <- f(d1, d2, "players")
## check "tocheck" and once it is correct dfinal <-"rbind", dmerged)

On Tue, Apr 5, 2011 at 8:06 AM, Ulisses.Camargo <> wrote:
> The example scene:
> I have a database with stats about each goal made by my soccer team. This
> database (a data frame in R) is organized in lines (goals) with a set of
> columns containing data about these goals (player name, tactic position,
> etc). For now, this database will be called "data.frame1".
> What I need is to feed this "data.frame1" with new information about my team
> goals. I will call this new information "data.frame2". This set of new goals
> is organized in the same way as in "data.frame1" (equal numbers of cols).
> Where help is needed:
> I need help in finding a way to check the player-name column in
> "data.frame2" before feeding "data.frame1" with it. What I need is a way to
> verify the name of the player on each line of "data.frame2" with the names
> of players that already exist on a col in "data.frame1". Moreover, I need R
> to make two main things:
> First, the lines of “data.frame2” with player names that already exists in
> “data.frame1” must be added to “data.frame1”.
> Second: lines of “data.frame2” with player names that does not exist on
> “data.frame1” must be listed in an output to be manually checked and
> corrected.
> After this verification, corrected lines and new-player-names lines must be
> incorporated in "data.frame1".
> What I want is to guarantee that will be no lines with wrong player names in
> my database.
> At the same time, my script must permit new information to be added (new
> player names).
> Is there somebody who could help me with this?
> Thanks for your attention
> Best wishes
> Ulisses
> --
> View this message in context:
> Sent from the R help mailing list archive at
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 05 Apr 2011 - 15:29:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 05 Apr 2011 - 15:50:26 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive