BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ursula
Pyrite | Level 9

Hi There,

 

Below is the example data, there will be 4 groups with the same values of variables B and C.

For example: id 001 (50 20), id 005 (50 20), id 007 (50 20) categories as the same group, because  they have the same values of B and C, and so on ...

 

I want to do random sampling for 2 samples from each group.

so in this example data, I will get 8 samples from 15 observations.

 

My actual data set have thousands of observations.

I wonder if there are codes to run the sampling at once by group.

 

Thank you in advance.

 

 

data have;
input ID $ B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20 
006 40 20
007 50 20 
008 40 20
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 30
015 70 30
;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

For your case, @ballardw proposed method would amount to

 

data have;
input id $3. B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20 
007 50 20
008 40 20 
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 20
015 70 30
;

proc sort data=have; by b c; run;

proc surveyselect data=have out=want sampsize=2;
strata b c;
run;
PG

View solution in original post

4 REPLIES 4
ballardw
Super User

Not following your description as it is hard to read as pasted into code box.

 

I would suggest adding a variable that holds a group identifier. Then you can use proc surveyselect with a strata variable to control sampling by "group".

A brief example that randomly selects records using the SEX variable to define "groups" for  3 females and 5 males from the SASHELP.CLASS data set.

proc sort data=sashelp.class out=work.class;
  by sex;
run;

proc surveyselect data=work.class out=work.sample
   sampsize= (3 5);
strata sex;
run;

The strata statement in surveyselect requires the data to be sorted by the strata variable. The Sampsize= tells SAS how many records to select. With the sampsize=(3 5) it says take 3 from the first level of the strata variable and 5 from the second level. A single numeral would indicate the same number from each. Or similarly you can use SAMPRATE to indicate a percentage of each strata.

 

There are additional options as well to use other than Simple Random Sample (the default) and including additional information and variables wanted in the output set.

The output data set as above provides variables containing the SelectionProb(ability) and SamplingWeight for use in procedures that might want a weight.

ursula
Pyrite | Level 9

sorry, I know it hard to read.

 

Here I repeated it again the data set

 

Below is the example data, there will be 4 groups with the same values of variables B and C.

For example: id 001 (50 20), id 005 (50 20), id 007 (50 20) categories as the same group, because  they have the same values of B and C, and so on ...

 

I want to do random sampling for 2 samples from each group.

so in this example data, I will get 8 samples from 15 observations.

 

My actual data set have thousands of observations.

I wonder if there are codes to run the sampling at once by group.

 

Thank you in advance.

 

 

data have;
input ID $ B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20
007 50 20
008 40 20
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 30
015 70 30
;
run;

 

 

 

 

ursula
Pyrite | Level 9

Thank you so much, PG.

 

it works!

 

 

PGStats
Opal | Level 21

For your case, @ballardw proposed method would amount to

 

data have;
input id $3. B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20 
007 50 20
008 40 20 
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 20
015 70 30
;

proc sort data=have; by b c; run;

proc surveyselect data=have out=want sampsize=2;
strata b c;
run;
PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 17861 views
  • 3 likes
  • 3 in conversation