Select records that are NOT in another dataset

Accepted Solution Solved
Reply
Contributor AD
Contributor
Posts: 31
Accepted Solution

Select records that are NOT in another dataset


Hey, I've got a dataset that I need to randomly split into two groups (i.e., datasets) of unequal size.  For example, I have a dataset (RatersAll) with 13 raters (RaterID = 1, 2, 3,..., 13), and I need to randomly select 8 of them to be put into one dataset (Raters1) and the remaining 5 to be put into another (Raters2).  My thoughts were to use Proc Surveyselect to randomly select the first 8 raters.  Then I was going to use Proc SQL to put the remaining raters in the second dataset.  My code is below.

Proc SurveySelect  Data=RatersAll Method=SRS

   Rep=1 Sampsize=8 Seed=0 Out=Raters1;

Run;

Proc SQL;

   Create Table Raters2 As

    Select t1.RaterID

    From RatersAll t1,

             Raters1 t2

    Where t1.RaterID <> t2.RaterID

  ;

The Proc Surveyselect code does exactly what I want.  My SQL code, however, returns way too many records.  I just want it to return the raters that were not selected in the surveyselect code.  I'm also open to other suggestions that make this whole process more efficient.  Thanks,

Andy


Accepted Solutions
Solution
‎02-11-2013 04:43 AM
Respected Advisor
Posts: 3,900

Re: Select records that are NOT in another dataset

Below a SQL syntax which should give you the correct result. I've also added a data step version which should perform more efficiently in case your data set RatersAll contains a lot of observations.

data RatersAll;
  set sashelp.class;
  RaterID=_n_;
run;

Proc SurveySelect  Data=RatersAll Method=SRS
  Rep=1 Sampsize=8 Seed=0 Out=Raters1;
Run;


/* SQL version */
Proc SQL;
  Create Table Raters2 As
    Select t1.*
    From RatersAll t1
    Where t1.RaterID not in (select RaterID from Raters1)
  ;
quit;

/* data step version */
data Raters2_DataStep;
  set RatersAll;
  if _n_=1 then do;
    declare hash h1 (dataset:'Raters1',hashexp:4);
    _rc=h1.defineKey('RaterID');
    _rc=h1.defineDone();
  end;
  if h1.check() then output;
run;

P.S:

For SQL there is also a EXCEPT operator which would work in your case. Here the docu with example SAS(R) 9.3 SQL Procedure User's Guide

View solution in original post


All Replies
Solution
‎02-11-2013 04:43 AM
Respected Advisor
Posts: 3,900

Re: Select records that are NOT in another dataset

Below a SQL syntax which should give you the correct result. I've also added a data step version which should perform more efficiently in case your data set RatersAll contains a lot of observations.

data RatersAll;
  set sashelp.class;
  RaterID=_n_;
run;

Proc SurveySelect  Data=RatersAll Method=SRS
  Rep=1 Sampsize=8 Seed=0 Out=Raters1;
Run;


/* SQL version */
Proc SQL;
  Create Table Raters2 As
    Select t1.*
    From RatersAll t1
    Where t1.RaterID not in (select RaterID from Raters1)
  ;
quit;

/* data step version */
data Raters2_DataStep;
  set RatersAll;
  if _n_=1 then do;
    declare hash h1 (dataset:'Raters1',hashexp:4);
    _rc=h1.defineKey('RaterID');
    _rc=h1.defineDone();
  end;
  if h1.check() then output;
run;

P.S:

For SQL there is also a EXCEPT operator which would work in your case. Here the docu with example SAS(R) 9.3 SQL Procedure User's Guide

Contributor AD
Contributor
Posts: 31

Re: Select records that are NOT in another dataset

Thanks, this is great!  I will mark your answer as "correct" as soon as the option appears, but it doesn't seem to be available right now.  This happened one other time, but after a week, the option became available.

As a side note, I also stumbled across the "Outall" option in Proc Surveyselect, which does everything I'm looking for as well.

Respected Advisor
Posts: 3,900

Re: Select records that are NOT in another dataset

Which "option" are you talking about? I don't know Proc Surveyselect very well. So great that there is already an option which does what you need.

Would be good if you could somehow mark your thread as answered so that it gets "closed".

Respected Advisor
Posts: 3,124

Re: Select records that are NOT in another dataset

I think OP was referring the "Option" of marking your answer as "Correct".

Contributor AD
Contributor
Posts: 31

Re: Select records that are NOT in another dataset

Sometimes on my browser I log onto a discussion that I started and the "Correct Answer" and "Helpful Answer" links are not available.  In other words, there is no way for me to mark your response as correct or to close the discussion.  The only thing I can do is visit the discussion at a later date and hope the "Correct Answer" link has reappeared.

So that is what I did this time.  I visited the discussion on Feb 11 and I couldn't mark Patrick's response as correct.  I came back the next day and it was there, so now I have marked it.

Thanks.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 2176 views
  • 0 likes
  • 3 in conversation