BookmarkSubscribeRSS Feed
gauravde
Fluorite | Level 6

I am trying to do frequency matching for case control studies. I have results from Dataset A with the frequency distribution for certain variables (Dataset A: Male 30%, Females 70% | Age>=65: 40%, Age <65: 60% | Region: West 10%, Northeast 20%, Southwest 40%, Midwest 25%, Southeast 5%). Please note that I do not have access to Dataset A and just have the frequency distribution for those 3 variables (Gender, Age, Region).

I have Dataset B and I need to create a subset of dataset B which will provide the same frequency distribution for those 3 variables as Dataset A i.e. when I create the subset of Dataset B and run proc freq on age, gender and region it should give the same results as given above for dataset A. 

Could you please suggest what is the best way to do that? 

Thanks.  

5 REPLIES 5
Reeza
Super User
PROC SURVEYSELECT and specifying your sample sizes as above.
If you don't have interaction frequencies, I'd assume equal distribution (dangerous assumption) and calculate the amount per combination and pass that through.

See example 3 here except you have one more variable:
https://stats.oarc.ucla.edu/sas/faq/how-can-i-take-a-stratified-random-sample-of-my-data/
gauravde
Fluorite | Level 6
Thank you! I will try this out.
Ksharp
Super User
%let sample_size=1000 ;
proc plan seed=27371 ;
factors n=&sample_size. ordered sex=10 /noprint;
output out=sex ;

factors n=&sample_size. ordered age=10 /noprint;
output out=age ;

run;


data sex;
set sex;
char_sex=ifc(sex in (1:3),'Male  ','Female');
keep char_sex;
run;

data age;
set age;
char_age=ifc(age in (1:4),'Age>=65 ','Age <65');
keep char_age;
run;

data want;
 merge sex age;
run;
gauravde
Fluorite | Level 6
Thank you! I will check this out.
Ksharp
Super User
%let sample_size=1000 ;
proc plan seed=27371 ;
factors sex=&sample_size.  /noprint;
output out=sex ;

factors age=&sample_size. /noprint;
output out=age ;

factors Region=&sample_size. /noprint;
output out=Region ;
quit;

data temp;
merge sex age region;
run;

proc rank data=temp out=temp2 groups=100 ;
var sex age region;
ranks r_sex r_age r_region;
run;

data want;
 set temp2;
 char_sex=ifc(r_sex in (0:29),'Male  ','Female');
 char_age=ifc(r_age in (0:39),'Age>=65 ','Age <65');

 select;
 when(r_region in (0:9))   char_Region='West     ';  
 when(r_region in (10:29)) char_Region='Northeast';
 when(r_region in (30:69)) char_Region='Southwest';
 when(r_region in (70:94)) char_Region='Midwest  ';
 when(r_region in (95:99)) char_Region='Southeast  ';
 otherwise;
 end;
keep char_:;
run;


proc freq data=want;
table char_:;
run;

Ksharp_0-1654227539681.png

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 562 views
  • 2 likes
  • 3 in conversation