BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
andrewjmdata
Obsidian | Level 7

Hi, I have mailing file which I have profiled on age, gender, ethnicity, occupation.

 

From the main database I want to select as big a sample as possible that has the same characteristics as the mailing file based on age, gender, ethnicity and occupation.

 

I can do this in a very long winded way by matching % of the database that are same in the mailing file but it would be better to if I could find a sas procedure/methodology to do it as I need an element of statistical rigour. 

 

I am using SAS EG v7.

 

Thanks

 

Andrew

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I think you need two columns: nMail and nPop with the mail sample sizes and the population sizes respectively. From those, you could do:

 

data counts;
input Age $ Sex $ Ethnicity $ Occupation $ nMail nPop; 
datalines;
20-30 Male Black Plumber 350 2000
30-40 Male White Painter 290 5000
40-50 Male White Lawyer 170 1200 
20-30 Female White Teacher 210 5000
30-40 Male Asian Doctor 290 500
40-50 Male White Plumber 170 3500
;

proc sql;
create table sampsize as
select
    *,
    min(nPop/nMail) * nMail as SampleSize
from counts;
select * from sampsize;
quit;

proc surveyselect data=pop out=control sampsize=sampsize;
strata age sex ethnicity occupation;
run;

Note: The population and sample size datasets should be sorted the same way.

 

PG

View solution in original post

5 REPLIES 5
pau13rown
Lapis Lazuli | Level 10

could use proc sort nodupkey to reduce the mailing file to all the possible combinations of age/sex/ethnicity/occupation. And then merge this onto the master file.

 

proc sort data=mailingfile out=template (keep=age sex ethnicity occupation) nodupkey;

by age sex ethnicity occupation;

run;

 

proc sort masterfile

by age sex ethnicity occupation;;

run;

 

data new;

merge masterfile template (in=a);

by age sex ethnicity occupation;

if a then output;

run;

 

it deponds a lot on exactly what you're doing ie how they should be matched. You could do something sophisticated ie proc psmatch https://support.sas.com/documentation/onlinedoc/stat/142/psmatch.pdf

andrewjmdata
Obsidian | Level 7

Mmm! Not sure that will work, I probably haven't explained the guts of this too well.

 

If in the mail file I have the following profile of age sex ethnicity occupation...

 

Age Sex Ethnicity Occupation % of total
20-30 Male Black Plumber 3.5%
30-40 Male White Painter 2.9%
40-50 Male White Lawyer 1.7%
20-30 Female White Teacher 2.1%
30-40 Male Asian Doctor 2.9%
40-50 Male White Plumber 1.7%

 

What I want in the control I select from the main database to have the same profile by age, sex, ethnicity and occupation in terms of the %.

 

Hope this makes more sense now, and thanks for your help so far.

 

Andrew

PGStats
Opal | Level 21

I think you need two columns: nMail and nPop with the mail sample sizes and the population sizes respectively. From those, you could do:

 

data counts;
input Age $ Sex $ Ethnicity $ Occupation $ nMail nPop; 
datalines;
20-30 Male Black Plumber 350 2000
30-40 Male White Painter 290 5000
40-50 Male White Lawyer 170 1200 
20-30 Female White Teacher 210 5000
30-40 Male Asian Doctor 290 500
40-50 Male White Plumber 170 3500
;

proc sql;
create table sampsize as
select
    *,
    min(nPop/nMail) * nMail as SampleSize
from counts;
select * from sampsize;
quit;

proc surveyselect data=pop out=control sampsize=sampsize;
strata age sex ethnicity occupation;
run;

Note: The population and sample size datasets should be sorted the same way.

 

PG
andrewjmdata
Obsidian | Level 7

That looks like it might be what I want. I'll have a bash at it in the morning.

 

Many thanks!

 

Andrew

Reeza
Super User

And if you don't have the latest version of SAS to use PSMATCH, see the Mayo Clinic's macro for greedy match algorithms.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 846 views
  • 3 likes
  • 4 in conversation