Hey, I've got a dataset that I need to randomly split into two groups (i.e., datasets) of unequal size. For example, I have a dataset (RatersAll) with 13 raters (RaterID = 1, 2, 3,..., 13), and I need to randomly select 8 of them to be put into one dataset (Raters1) and the remaining 5 to be put into another (Raters2). My thoughts were to use Proc Surveyselect to randomly select the first 8 raters. Then I was going to use Proc SQL to put the remaining raters in the second dataset. My code is below.
Proc SurveySelect Data=RatersAll Method=SRS
Rep=1 Sampsize=8 Seed=0 Out=Raters1;
Run;
Proc SQL;
Create Table Raters2 As
Select t1.RaterID
From RatersAll t1,
Raters1 t2
Where t1.RaterID <> t2.RaterID
;
The Proc Surveyselect code does exactly what I want. My SQL code, however, returns way too many records. I just want it to return the raters that were not selected in the surveyselect code. I'm also open to other suggestions that make this whole process more efficient. Thanks,
Andy
Below a SQL syntax which should give you the correct result. I've also added a data step version which should perform more efficiently in case your data set RatersAll contains a lot of observations.
data RatersAll;
set sashelp.class;
RaterID=_n_;
run;
Proc SurveySelect Data=RatersAll Method=SRS
Rep=1 Sampsize=8 Seed=0 Out=Raters1;
Run;
/* SQL version */
Proc SQL;
Create Table Raters2 As
Select t1.*
From RatersAll t1
Where t1.RaterID not in (select RaterID from Raters1)
;
quit;
/* data step version */
data Raters2_DataStep;
set RatersAll;
if _n_=1 then do;
declare hash h1 (dataset:'Raters1',hashexp:4);
_rc=h1.defineKey('RaterID');
_rc=h1.defineDone();
end;
if h1.check() then output;
run;
P.S:
For SQL there is also a EXCEPT operator which would work in your case. Here the docu with example SAS(R) 9.3 SQL Procedure User's Guide
Below a SQL syntax which should give you the correct result. I've also added a data step version which should perform more efficiently in case your data set RatersAll contains a lot of observations.
data RatersAll;
set sashelp.class;
RaterID=_n_;
run;
Proc SurveySelect Data=RatersAll Method=SRS
Rep=1 Sampsize=8 Seed=0 Out=Raters1;
Run;
/* SQL version */
Proc SQL;
Create Table Raters2 As
Select t1.*
From RatersAll t1
Where t1.RaterID not in (select RaterID from Raters1)
;
quit;
/* data step version */
data Raters2_DataStep;
set RatersAll;
if _n_=1 then do;
declare hash h1 (dataset:'Raters1',hashexp:4);
_rc=h1.defineKey('RaterID');
_rc=h1.defineDone();
end;
if h1.check() then output;
run;
P.S:
For SQL there is also a EXCEPT operator which would work in your case. Here the docu with example SAS(R) 9.3 SQL Procedure User's Guide
Thanks, this is great! I will mark your answer as "correct" as soon as the option appears, but it doesn't seem to be available right now. This happened one other time, but after a week, the option became available.
As a side note, I also stumbled across the "Outall" option in Proc Surveyselect, which does everything I'm looking for as well.
Which "option" are you talking about? I don't know Proc Surveyselect very well. So great that there is already an option which does what you need.
Would be good if you could somehow mark your thread as answered so that it gets "closed".
I think OP was referring the "Option" of marking your answer as "Correct".
Sometimes on my browser I log onto a discussion that I started and the "Correct Answer" and "Helpful Answer" links are not available. In other words, there is no way for me to mark your response as correct or to close the discussion. The only thing I can do is visit the discussion at a later date and hope the "Correct Answer" link has reappeared.
So that is what I did this time. I visited the discussion on Feb 11 and I couldn't mark Patrick's response as correct. I came back the next day and it was there, so now I have marked it.
Thanks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.