DATA Step, Macro, Functions and more

Creating dataset that matches profile of another dataset

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

Creating dataset that matches profile of another dataset

Hi, I have mailing file which I have profiled on age, gender, ethnicity, occupation.

 

From the main database I want to select as big a sample as possible that has the same characteristics as the mailing file based on age, gender, ethnicity and occupation.

 

I can do this in a very long winded way by matching % of the database that are same in the mailing file but it would be better to if I could find a sas procedure/methodology to do it as I need an element of statistical rigour. 

 

I am using SAS EG v7.

 

Thanks

 

Andrew


Accepted Solutions
Solution
4 weeks ago
Esteemed Advisor
Posts: 5,536

Re: Creating dataset that matches profile of another dataset

[ Edited ]
Posted in reply to andrewjmdata

I think you need two columns: nMail and nPop with the mail sample sizes and the population sizes respectively. From those, you could do:

 

data counts;
input Age $ Sex $ Ethnicity $ Occupation $ nMail nPop; 
datalines;
20-30 Male Black Plumber 350 2000
30-40 Male White Painter 290 5000
40-50 Male White Lawyer 170 1200 
20-30 Female White Teacher 210 5000
30-40 Male Asian Doctor 290 500
40-50 Male White Plumber 170 3500
;

proc sql;
create table sampsize as
select
    *,
    min(nPop/nMail) * nMail as SampleSize
from counts;
select * from sampsize;
quit;

proc surveyselect data=pop out=control sampsize=sampsize;
strata age sex ethnicity occupation;
run;

Note: The population and sample size datasets should be sorted the same way.

 

PG

View solution in original post


All Replies
Regular Contributor
Posts: 162

Re: Creating dataset that matches profile of another dataset

[ Edited ]
Posted in reply to andrewjmdata

could use proc sort nodupkey to reduce the mailing file to all the possible combinations of age/sex/ethnicity/occupation. And then merge this onto the master file.

 

proc sort data=mailingfile out=template (keep=age sex ethnicity occupation) nodupkey;

by age sex ethnicity occupation;

run;

 

proc sort masterfile

by age sex ethnicity occupation;;

run;

 

data new;

merge masterfile template (in=a);

by age sex ethnicity occupation;

if a then output;

run;

 

it deponds a lot on exactly what you're doing ie how they should be matched. You could do something sophisticated ie proc psmatch https://support.sas.com/documentation/onlinedoc/stat/142/psmatch.pdf

--------------
blog: papersandprograms.com
Occasional Contributor
Posts: 14

Re: Creating dataset that matches profile of another dataset

[ Edited ]
Posted in reply to PaulBrownPhD

Mmm! Not sure that will work, I probably haven't explained the guts of this too well.

 

If in the mail file I have the following profile of age sex ethnicity occupation...

 

Age Sex Ethnicity Occupation % of total
20-30 Male Black Plumber 3.5%
30-40 Male White Painter 2.9%
40-50 Male White Lawyer 1.7%
20-30 Female White Teacher 2.1%
30-40 Male Asian Doctor 2.9%
40-50 Male White Plumber 1.7%

 

What I want in the control I select from the main database to have the same profile by age, sex, ethnicity and occupation in terms of the %.

 

Hope this makes more sense now, and thanks for your help so far.

 

Andrew

Solution
4 weeks ago
Esteemed Advisor
Posts: 5,536

Re: Creating dataset that matches profile of another dataset

[ Edited ]
Posted in reply to andrewjmdata

I think you need two columns: nMail and nPop with the mail sample sizes and the population sizes respectively. From those, you could do:

 

data counts;
input Age $ Sex $ Ethnicity $ Occupation $ nMail nPop; 
datalines;
20-30 Male Black Plumber 350 2000
30-40 Male White Painter 290 5000
40-50 Male White Lawyer 170 1200 
20-30 Female White Teacher 210 5000
30-40 Male Asian Doctor 290 500
40-50 Male White Plumber 170 3500
;

proc sql;
create table sampsize as
select
    *,
    min(nPop/nMail) * nMail as SampleSize
from counts;
select * from sampsize;
quit;

proc surveyselect data=pop out=control sampsize=sampsize;
strata age sex ethnicity occupation;
run;

Note: The population and sample size datasets should be sorted the same way.

 

PG
Occasional Contributor
Posts: 14

Re: Creating dataset that matches profile of another dataset

That looks like it might be what I want. I'll have a bash at it in the morning.

 

Many thanks!

 

Andrew

Super User
Posts: 23,770

Re: Creating dataset that matches profile of another dataset

Posted in reply to andrewjmdata

And if you don't have the latest version of SAS to use PSMATCH, see the Mayo Clinic's macro for greedy match algorithms.

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 142 views
  • 3 likes
  • 4 in conversation