BookmarkSubscribeRSS Feed
gauravde
Fluorite | Level 6

I am trying to do frequency matching for case control studies. I have results from Dataset A with the frequency distribution for certain variables (Dataset A: Male 30%, Females 70% | Age>=65: 40%, Age <65: 60% | Region: West 10%, Northeast 20%, Southwest 40%, Midwest 25%, Southeast 5%). Please note that I do not have access to Dataset A and just have the frequency distribution for those 3 variables (Gender, Age, Region).

I have Dataset B and I need to create a subset of dataset B which will provide the same frequency distribution for those 3 variables as Dataset A i.e. when I create the subset of Dataset B and run proc freq on age, gender and region it should give the same results as given above for dataset A. 

Could you please suggest what is the best way to do that? 

Thanks.  

5 REPLIES 5
Reeza
Super User
PROC SURVEYSELECT and specifying your sample sizes as above.
If you don't have interaction frequencies, I'd assume equal distribution (dangerous assumption) and calculate the amount per combination and pass that through.

See example 3 here except you have one more variable:
https://stats.oarc.ucla.edu/sas/faq/how-can-i-take-a-stratified-random-sample-of-my-data/
gauravde
Fluorite | Level 6
Thank you! I will try this out.
Ksharp
Super User
%let sample_size=1000 ;
proc plan seed=27371 ;
factors n=&sample_size. ordered sex=10 /noprint;
output out=sex ;

factors n=&sample_size. ordered age=10 /noprint;
output out=age ;

run;


data sex;
set sex;
char_sex=ifc(sex in (1:3),'Male  ','Female');
keep char_sex;
run;

data age;
set age;
char_age=ifc(age in (1:4),'Age>=65 ','Age <65');
keep char_age;
run;

data want;
 merge sex age;
run;
gauravde
Fluorite | Level 6
Thank you! I will check this out.
Ksharp
Super User
%let sample_size=1000 ;
proc plan seed=27371 ;
factors sex=&sample_size.  /noprint;
output out=sex ;

factors age=&sample_size. /noprint;
output out=age ;

factors Region=&sample_size. /noprint;
output out=Region ;
quit;

data temp;
merge sex age region;
run;

proc rank data=temp out=temp2 groups=100 ;
var sex age region;
ranks r_sex r_age r_region;
run;

data want;
 set temp2;
 char_sex=ifc(r_sex in (0:29),'Male  ','Female');
 char_age=ifc(r_age in (0:39),'Age>=65 ','Age <65');

 select;
 when(r_region in (0:9))   char_Region='West     ';  
 when(r_region in (10:29)) char_Region='Northeast';
 when(r_region in (30:69)) char_Region='Southwest';
 when(r_region in (70:94)) char_Region='Midwest  ';
 when(r_region in (95:99)) char_Region='Southeast  ';
 otherwise;
 end;
keep char_:;
run;


proc freq data=want;
table char_:;
run;

Ksharp_0-1654227539681.png

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1216 views
  • 2 likes
  • 3 in conversation