BookmarkSubscribeRSS Feed
dcalbet01
Calcite | Level 5

Hello good morning;

I have a problem and I do not know how to solve it. I understand that, through SurveySelect, but I don't know how to plant it, I've been thinking about it for days, and I don't see the light!

 

I have a Data Base of 150,000 records and I have to select a subsample of the Data Base of 85,000 records, one for each household, according to a series of variables.

 

The stratum variables are
Household ID
Sex
Age

 

It is about selecting 85,000 cases out of 150,000 that meet the following conditions


- A single record per household (there are 85,000 households)
- 50% men and 50% women approximately
- Proportional for age groups (10% between 18 and 25 years old, 10% between 25 and 35 years old, etc.)

 

Can someone help me, I will be eternally grateful.

1 REPLY 1
awesome_opossum
Obsidian | Level 7

You may wish to check my work, but I believe you can do it manually as so.  The "problem" with surveyselect for your purposes, I believe, is that 1) it uses the natural representation for any strata, whereas you wish to impose weighting/representation (e.g. 50% male/female, x% for groups... you say 10%, but if you're seeking to make them equally represented, as in, with the same rule, as sex, it is actually the number of age groups you are interested in that defines the proportion, as in 10% would be for 10 groups, like 50% is for two sexes), and 2) you also wish to draw only one case per house. 

 



data house;
 input id house $ sex $ age;
cards;
1 House1 M 23 
2 House1 F 55
3 House2 M 22
4 House2 M 27
5 House3 F 15
6 House3 M 36
7 House3 F 50
8 House3 F 33
9 House3 M 22
10 House3 M 25
11 House3 M 21
12 House4 F 15
13 House4 M 38
14 House5 M 38
15 House5 F 38
16 House6 F 37
;run; 

proc sql; create table house2
as select
id, 
count(id) as idct,
house, 
sex, 
count(unique sex) as sexgroupct,
age, 
(case when age >= 15 and age < 25 then '15 to 24'
	when age >= 25 and age < 35 then '25 to 34'
	when age >= 35 then '>=35' end) as agegroup, 
count(unique calculated agegroup) as agegroupct
from house
; 
quit; 

data house3; set house2; 
weightgroup = catx('-', sex, agegroup); 
run; 

proc sql; create table house4
as select
*, 
count(weightgroup) as weightgroupct,
(calculated weightgroupct / idct) as weightgroupA, 
(1 / sexgroupct) as sexgroupweight,
(1 / agegroupct) as agegroupweight, 
(calculated sexgroupweight * calculated agegroupweight) as weightgroupT,
(calculated weightgroupT / calculated weightgroupA) as weight
from house3
group by weightgroup; 
quit; 

data house5; set house4; 
call streaminit(123); 
random = rand('Uniform'); 
randombyweight = (random * weight); 
run; 

proc sql; create table house6
as select
*
from house5
group by house
having randombyweight = max(randombyweight); 
quit;  

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 284 views
  • 0 likes
  • 2 in conversation