I have a problem and I do not know how to solve it. I understand that, through SurveySelect, but I don't know how to plant it, I've been thinking about it for days, and I don't see the light!
I have a Data Base of 150,000 records and I have to select a subsample of the Data Base of 85,000 records, one for each household, according to a series of variables.
The stratum variables are Household ID Sex Age
It is about selecting 85,000 cases out of 150,000 that meet the following conditions
- A single record per household (there are 85,000 households) - 50% men and 50% women approximately - Proportional for age groups (10% between 18 and 25 years old, 10% between 25 and 35 years old, etc.)
Can someone help me, I will be eternally grateful.
You may wish to check my work, but I believe you can do it manually as so. The "problem" with surveyselect for your purposes, I believe, is that 1) it uses the natural representation for any strata, whereas you wish to impose weighting/representation (e.g. 50% male/female, x% for groups... you say 10%, but if you're seeking to make them equally represented, as in, with the same rule, as sex, it is actually the number of age groups you are interested in that defines the proportion, as in 10% would be for 10 groups, like 50% is for two sexes), and 2) you also wish to draw only one case per house.
input id house $ sex $ age;
1 House1 M 23
2 House1 F 55
3 House2 M 22
4 House2 M 27
5 House3 F 15
6 House3 M 36
7 House3 F 50
8 House3 F 33
9 House3 M 22
10 House3 M 25
11 House3 M 21
12 House4 F 15
13 House4 M 38
14 House5 M 38
15 House5 F 38
16 House6 F 37
proc sql; create table house2
count(id) as idct,
count(unique sex) as sexgroupct,
(case when age >= 15 and age < 25 then '15 to 24'
when age >= 25 and age < 35 then '25 to 34'
when age >= 35 then '>=35' end) as agegroup,
count(unique calculated agegroup) as agegroupct
data house3; set house2;
weightgroup = catx('-', sex, agegroup);
proc sql; create table house4
count(weightgroup) as weightgroupct,
(calculated weightgroupct / idct) as weightgroupA,
(1 / sexgroupct) as sexgroupweight,
(1 / agegroupct) as agegroupweight,
(calculated sexgroupweight * calculated agegroupweight) as weightgroupT,
(calculated weightgroupT / calculated weightgroupA) as weight
group by weightgroup;
data house5; set house4;
random = rand('Uniform');
randombyweight = (random * weight);
proc sql; create table house6
group by house
having randombyweight = max(randombyweight);
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.