BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AD
Calcite | Level 5 AD
Calcite | Level 5


Hey, I've got a dataset that I need to randomly split into two groups (i.e., datasets) of unequal size.  For example, I have a dataset (RatersAll) with 13 raters (RaterID = 1, 2, 3,..., 13), and I need to randomly select 8 of them to be put into one dataset (Raters1) and the remaining 5 to be put into another (Raters2).  My thoughts were to use Proc Surveyselect to randomly select the first 8 raters.  Then I was going to use Proc SQL to put the remaining raters in the second dataset.  My code is below.

Proc SurveySelect  Data=RatersAll Method=SRS

   Rep=1 Sampsize=8 Seed=0 Out=Raters1;

Run;

Proc SQL;

   Create Table Raters2 As

    Select t1.RaterID

    From RatersAll t1,

             Raters1 t2

    Where t1.RaterID <> t2.RaterID

  ;

The Proc Surveyselect code does exactly what I want.  My SQL code, however, returns way too many records.  I just want it to return the raters that were not selected in the surveyselect code.  I'm also open to other suggestions that make this whole process more efficient.  Thanks,

Andy

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below a SQL syntax which should give you the correct result. I've also added a data step version which should perform more efficiently in case your data set RatersAll contains a lot of observations.

data RatersAll;
  set sashelp.class;
  RaterID=_n_;
run;

Proc SurveySelect  Data=RatersAll Method=SRS
  Rep=1 Sampsize=8 Seed=0 Out=Raters1;
Run;


/* SQL version */
Proc SQL;
  Create Table Raters2 As
    Select t1.*
    From RatersAll t1
    Where t1.RaterID not in (select RaterID from Raters1)
  ;
quit;

/* data step version */
data Raters2_DataStep;
  set RatersAll;
  if _n_=1 then do;
    declare hash h1 (dataset:'Raters1',hashexp:4);
    _rc=h1.defineKey('RaterID');
    _rc=h1.defineDone();
  end;
  if h1.check() then output;
run;

P.S:

For SQL there is also a EXCEPT operator which would work in your case. Here the docu with example SAS(R) 9.3 SQL Procedure User's Guide

View solution in original post

5 REPLIES 5
Patrick
Opal | Level 21

Below a SQL syntax which should give you the correct result. I've also added a data step version which should perform more efficiently in case your data set RatersAll contains a lot of observations.

data RatersAll;
  set sashelp.class;
  RaterID=_n_;
run;

Proc SurveySelect  Data=RatersAll Method=SRS
  Rep=1 Sampsize=8 Seed=0 Out=Raters1;
Run;


/* SQL version */
Proc SQL;
  Create Table Raters2 As
    Select t1.*
    From RatersAll t1
    Where t1.RaterID not in (select RaterID from Raters1)
  ;
quit;

/* data step version */
data Raters2_DataStep;
  set RatersAll;
  if _n_=1 then do;
    declare hash h1 (dataset:'Raters1',hashexp:4);
    _rc=h1.defineKey('RaterID');
    _rc=h1.defineDone();
  end;
  if h1.check() then output;
run;

P.S:

For SQL there is also a EXCEPT operator which would work in your case. Here the docu with example SAS(R) 9.3 SQL Procedure User's Guide

AD
Calcite | Level 5 AD
Calcite | Level 5

Thanks, this is great!  I will mark your answer as "correct" as soon as the option appears, but it doesn't seem to be available right now.  This happened one other time, but after a week, the option became available.

As a side note, I also stumbled across the "Outall" option in Proc Surveyselect, which does everything I'm looking for as well.

Patrick
Opal | Level 21

Which "option" are you talking about? I don't know Proc Surveyselect very well. So great that there is already an option which does what you need.

Would be good if you could somehow mark your thread as answered so that it gets "closed".

Haikuo
Onyx | Level 15

I think OP was referring the "Option" of marking your answer as "Correct".

AD
Calcite | Level 5 AD
Calcite | Level 5

Sometimes on my browser I log onto a discussion that I started and the "Correct Answer" and "Helpful Answer" links are not available.  In other words, there is no way for me to mark your response as correct or to close the discussion.  The only thing I can do is visit the discussion at a later date and hope the "Correct Answer" link has reappeared.

So that is what I did this time.  I visited the discussion on Feb 11 and I couldn't mark Patrick's response as correct.  I came back the next day and it was there, so now I have marked it.

Thanks.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 33835 views
  • 0 likes
  • 3 in conversation