BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Myurathan
Quartz | Level 8

Hi Experts,

 

I have never done sampling in SAS. I have a population table with a million rows. I will have to extract 500 entries that meet the following criteria.

  1. The sample should include all available distinct values from three column

How can I do this in SAS. 

 

Thanks in advance. 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Onizuka
Pyrite | Level 9

Hi, you can use the proc surveyselect :

 

Data have ;
set Yourtable ;
keep var1 var2 var3 ;
run ;

proc surveyselect data=have
   method=srs n=500 out=SampleSRS;
run;

View solution in original post

5 REPLIES 5
Onizuka
Pyrite | Level 9

Hi, you can use the proc surveyselect :

 

Data have ;
set Yourtable ;
keep var1 var2 var3 ;
run ;

proc surveyselect data=have
   method=srs n=500 out=SampleSRS;
run;
colabear
Obsidian | Level 7

Hello, 

I have a follow up to this questions. I need random samples. However if the number of observations from a 15% sample are less than 10 records, then I need the program to select 10 random records. Is there any way to do this? 

Watts
SAS Employee

You can use the NMIN= option to specify the minimum sample size. 

 

proc surveyselect rate=.15 nmin=10;
colabear
Obsidian | Level 7

Thank you very much! Smiley Very Happy

Rick_SAS
SAS Super FREQ

I just want to point out that the question says

   > The sample should include all available distinct values from three columns

The PROC SURVEYSELECT code that you marked as "correct" does not necessarily "include all available distinct values." It simply extracts 500 random observations.

 

In general, it might be impossible to satisfy that constraint. For example, if X1 = _N_, then there are 1 million distinct values and no subset of 500 observations can include all distinct values. If you want to include all distinct values, you would have to sort the data, then use the FIRST.VAR technique to extract the distinct combinations:

 

proc sort data=have;
by x1 - x3;
run;

data distinct;
   set have;
   by x1 - x3;
   if first.x1 | first.x2 | first.x3;
run;

This method is unlikely to create 500 observations. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 809 views
  • 6 likes
  • 5 in conversation