Hello good morning;
I have a problem and I do not know how to solve it. I understand that, through SurveySelect, but I don't know how to plant it, I've been thinking about it for days, and I don't see the light!
I have a Data Base of 150,000 records and I have to select a subsample of the Data Base of 85,000 records, one for each household, according to a series of variables.
The stratum variables are
Household ID
Sex
Age
It is about selecting 85,000 cases out of 150,000 that meet the following conditions
- A single record per household (there are 85,000 households)
- 50% men and 50% women approximately
- Proportional for age groups (10% between 18 and 25 years old, 10% between 25 and 35 years old, etc.)
Can someone help me, I will be eternally grateful.
You may wish to check my work, but I believe you can do it manually as so. The "problem" with surveyselect for your purposes, I believe, is that 1) it uses the natural representation for any strata, whereas you wish to impose weighting/representation (e.g. 50% male/female, x% for groups... you say 10%, but if you're seeking to make them equally represented, as in, with the same rule, as sex, it is actually the number of age groups you are interested in that defines the proportion, as in 10% would be for 10 groups, like 50% is for two sexes), and 2) you also wish to draw only one case per house.
data house;
input id house $ sex $ age;
cards;
1 House1 M 23
2 House1 F 55
3 House2 M 22
4 House2 M 27
5 House3 F 15
6 House3 M 36
7 House3 F 50
8 House3 F 33
9 House3 M 22
10 House3 M 25
11 House3 M 21
12 House4 F 15
13 House4 M 38
14 House5 M 38
15 House5 F 38
16 House6 F 37
;run;
proc sql; create table house2
as select
id,
count(id) as idct,
house,
sex,
count(unique sex) as sexgroupct,
age,
(case when age >= 15 and age < 25 then '15 to 24'
when age >= 25 and age < 35 then '25 to 34'
when age >= 35 then '>=35' end) as agegroup,
count(unique calculated agegroup) as agegroupct
from house
;
quit;
data house3; set house2;
weightgroup = catx('-', sex, agegroup);
run;
proc sql; create table house4
as select
*,
count(weightgroup) as weightgroupct,
(calculated weightgroupct / idct) as weightgroupA,
(1 / sexgroupct) as sexgroupweight,
(1 / agegroupct) as agegroupweight,
(calculated sexgroupweight * calculated agegroupweight) as weightgroupT,
(calculated weightgroupT / calculated weightgroupA) as weight
from house3
group by weightgroup;
quit;
data house5; set house4;
call streaminit(123);
random = rand('Uniform');
randombyweight = (random * weight);
run;
proc sql; create table house6
as select
*
from house5
group by house
having randombyweight = max(randombyweight);
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.