Solved: Re: Creating dataset that matches profile of another dataset

andrewjmdata · Posted 06-16-2018 06:45 AM

Hi, I have mailing file which I have profiled on age, gender, ethnicity, occupation.

From the main database I want to select as big a sample as possible that has the same characteristics as the mailing file based on age, gender, ethnicity and occupation.

I can do this in a very long winded way by matching % of the database that are same in the mailing file but it would be better to if I could find a sas procedure/methodology to do it as I need an element of statistical rigour.

I am using SAS EG v7.

Thanks

Andrew

PGStats · Posted 06-17-2018 05:14 PM

I think you need two columns: nMail and nPop with the mail sample sizes and the population sizes respectively. From those, you could do:

data counts;
input Age $ Sex $ Ethnicity $ Occupation $ nMail nPop; 
datalines;
20-30 Male Black Plumber 350 2000
30-40 Male White Painter 290 5000
40-50 Male White Lawyer 170 1200 
20-30 Female White Teacher 210 5000
30-40 Male Asian Doctor 290 500
40-50 Male White Plumber 170 3500
;

proc sql;
create table sampsize as
select
    *,
    min(nPop/nMail) * nMail as SampleSize
from counts;
select * from sampsize;
quit;

proc surveyselect data=pop out=control sampsize=sampsize;
strata age sex ethnicity occupation;
run;

Note: The population and sample size datasets should be sorted the same way.

PG

View solution in original post

pau13rown · Posted 06-16-2018 11:04 AM

could use proc sort nodupkey to reduce the mailing file to all the possible combinations of age/sex/ethnicity/occupation. And then merge this onto the master file.

proc sort data=mailingfile out=template (keep=age sex ethnicity occupation) nodupkey;

by age sex ethnicity occupation;

run;

proc sort masterfile

by age sex ethnicity occupation;;

run;

data new;

merge masterfile template (in=a);

by age sex ethnicity occupation;

if a then output;

run;

it deponds a lot on exactly what you're doing ie how they should be matched. You could do something sophisticated ie proc psmatch https://support.sas.com/documentation/onlinedoc/stat/142/psmatch.pdf

andrewjmdata · Posted 06-17-2018 07:48 AM

Mmm! Not sure that will work, I probably haven't explained the guts of this too well.

If in the mail file I have the following profile of age sex ethnicity occupation...

Age	Sex	Ethnicity	Occupation	% of total
20-30	Male	Black	Plumber	3.5%
30-40	Male	White	Painter	2.9%
40-50	Male	White	Lawyer	1.7%
20-30	Female	White	Teacher	2.1%
30-40	Male	Asian	Doctor	2.9%
40-50	Male	White	Plumber	1.7%

What I want in the control I select from the main database to have the same profile by age, sex, ethnicity and occupation in terms of the %.

Hope this makes more sense now, and thanks for your help so far.

Andrew

PGStats · Posted 06-17-2018 05:14 PM

I think you need two columns: nMail and nPop with the mail sample sizes and the population sizes respectively. From those, you could do:

data counts;
input Age $ Sex $ Ethnicity $ Occupation $ nMail nPop; 
datalines;
20-30 Male Black Plumber 350 2000
30-40 Male White Painter 290 5000
40-50 Male White Lawyer 170 1200 
20-30 Female White Teacher 210 5000
30-40 Male Asian Doctor 290 500
40-50 Male White Plumber 170 3500
;

proc sql;
create table sampsize as
select
    *,
    min(nPop/nMail) * nMail as SampleSize
from counts;
select * from sampsize;
quit;

proc surveyselect data=pop out=control sampsize=sampsize;
strata age sex ethnicity occupation;
run;

Note: The population and sample size datasets should be sorted the same way.

PG

andrewjmdata · Posted 06-18-2018 01:30 PM

That looks like it might be what I want. I'll have a bash at it in the morning.

Many thanks!

Andrew

Reeza · Posted 06-16-2018 07:45 PM

And if you don't have the latest version of SAS to use PSMATCH, see the Mayo Clinic's macro for greedy match algorithms.

Creating dataset that matches profile of another dataset

Re: Creating dataset that matches profile of another dataset

Re: Creating dataset that matches profile of another dataset

Re: Creating dataset that matches profile of another dataset

Re: Creating dataset that matches profile of another dataset

Re: Creating dataset that matches profile of another dataset

Re: Creating dataset that matches profile of another dataset

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away