turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Select records that are NOT in another dataset

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-10-2013 10:34 PM

Hey, I've got a dataset that I need to randomly split into two groups (i.e., datasets) of unequal size. For example, I have a dataset (RatersAll) with 13 raters (RaterID = 1, 2, 3,..., 13), and I need to randomly select 8 of them to be put into one dataset (Raters1) and the remaining 5 to be put into another (Raters2). My thoughts were to use Proc Surveyselect to randomly select the first 8 raters. Then I was going to use Proc SQL to put the remaining raters in the second dataset. My code is below.

Proc SurveySelect Data=RatersAll Method=SRS

Rep=1 Sampsize=8 Seed=0 Out=Raters1;

Run;

Proc SQL;

Create Table Raters2 As

Select t1.RaterID

From RatersAll t1,

Raters1 t2

Where t1.RaterID <> t2.RaterID

;

The Proc Surveyselect code does exactly what I want. My SQL code, however, returns way too many records. I just want it to return the raters that were not selected in the surveyselect code. I'm also open to other suggestions that make this whole process more efficient. Thanks,

Andy

Accepted Solutions

Solution

02-11-2013
04:43 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-11-2013 04:43 AM

Below a SQL syntax which should give you the correct result. I've also added a data step version which should perform more efficiently in case your data set RatersAll contains a lot of observations.

data RatersAll;

set sashelp.class;

RaterID=_n_;

run;

Proc SurveySelect Data=RatersAll Method=SRS

Rep=1 Sampsize=8 Seed=0 Out=Raters1;

Run;

/* SQL version */

Proc SQL;

Create Table Raters2 As

Select t1.*

From RatersAll t1

Where t1.RaterID not in (select RaterID from Raters1)

;

quit;

/* data step version */

data Raters2_DataStep;

set RatersAll;

if _n_=1 then do;

declare hash h1 (dataset:'Raters1',hashexp:4);

_rc=h1.defineKey('RaterID');

_rc=h1.defineDone();

end;

if h1.check() then output;

run;

P.S:

For SQL there is also a EXCEPT operator which would work in your case. Here the docu with example SAS(R) 9.3 SQL Procedure User's Guide

All Replies

Solution

02-11-2013
04:43 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-11-2013 04:43 AM

Below a SQL syntax which should give you the correct result. I've also added a data step version which should perform more efficiently in case your data set RatersAll contains a lot of observations.

data RatersAll;

set sashelp.class;

RaterID=_n_;

run;

Proc SurveySelect Data=RatersAll Method=SRS

Rep=1 Sampsize=8 Seed=0 Out=Raters1;

Run;

/* SQL version */

Proc SQL;

Create Table Raters2 As

Select t1.*

From RatersAll t1

Where t1.RaterID not in (select RaterID from Raters1)

;

quit;

/* data step version */

data Raters2_DataStep;

set RatersAll;

if _n_=1 then do;

declare hash h1 (dataset:'Raters1',hashexp:4);

_rc=h1.defineKey('RaterID');

_rc=h1.defineDone();

end;

if h1.check() then output;

run;

P.S:

For SQL there is also a EXCEPT operator which would work in your case. Here the docu with example SAS(R) 9.3 SQL Procedure User's Guide

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-11-2013 09:59 AM

Thanks, this is great! I will mark your answer as "correct" as soon as the option appears, but it doesn't seem to be available right now. This happened one other time, but after a week, the option became available.

As a side note, I also stumbled across the "Outall" option in Proc Surveyselect, which does everything I'm looking for as well.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-12-2013 06:18 AM

Which "option" are you talking about? I don't know Proc Surveyselect very well. So great that there is already an option which does what you need.

Would be good if you could somehow mark your thread as answered so that it gets "closed".

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-12-2013 09:29 AM

I think OP was referring the "Option" of marking your answer as "Correct".

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-12-2013 02:29 PM

Sometimes on my browser I log onto a discussion that I started and the "Correct Answer" and "Helpful Answer" links are not available. In other words, there is no way for me to mark your response as correct or to close the discussion. The only thing I can do is visit the discussion at a later date and hope the "Correct Answer" link has reappeared.

So that is what I did this time. I visited the discussion on Feb 11 and I couldn't mark Patrick's response as correct. I came back the next day and it was there, so now I have marked it.

Thanks.