BookmarkSubscribeRSS Feed
Calcite | Level 5

Hello good morning;

I have a problem and I do not know how to solve it. I understand that, through SurveySelect, but I don't know how to plant it, I've been thinking about it for days, and I don't see the light!


I have a Data Base of 150,000 records and I have to select a subsample of the Data Base of 85,000 records, one for each household, according to a series of variables.


The stratum variables are
Household ID


It is about selecting 85,000 cases out of 150,000 that meet the following conditions

- A single record per household (there are 85,000 households)
- 50% men and 50% women approximately
- Proportional for age groups (10% between 18 and 25 years old, 10% between 25 and 35 years old, etc.)


Can someone help me, I will be eternally grateful.

Obsidian | Level 7

You may wish to check my work, but I believe you can do it manually as so.  The "problem" with surveyselect for your purposes, I believe, is that 1) it uses the natural representation for any strata, whereas you wish to impose weighting/representation (e.g. 50% male/female, x% for groups... you say 10%, but if you're seeking to make them equally represented, as in, with the same rule, as sex, it is actually the number of age groups you are interested in that defines the proportion, as in 10% would be for 10 groups, like 50% is for two sexes), and 2) you also wish to draw only one case per house. 


data house;
 input id house $ sex $ age;
1 House1 M 23 
2 House1 F 55
3 House2 M 22
4 House2 M 27
5 House3 F 15
6 House3 M 36
7 House3 F 50
8 House3 F 33
9 House3 M 22
10 House3 M 25
11 House3 M 21
12 House4 F 15
13 House4 M 38
14 House5 M 38
15 House5 F 38
16 House6 F 37

proc sql; create table house2
as select
count(id) as idct,
count(unique sex) as sexgroupct,
(case when age >= 15 and age < 25 then '15 to 24'
	when age >= 25 and age < 35 then '25 to 34'
	when age >= 35 then '>=35' end) as agegroup, 
count(unique calculated agegroup) as agegroupct
from house

data house3; set house2; 
weightgroup = catx('-', sex, agegroup); 

proc sql; create table house4
as select
count(weightgroup) as weightgroupct,
(calculated weightgroupct / idct) as weightgroupA, 
(1 / sexgroupct) as sexgroupweight,
(1 / agegroupct) as agegroupweight, 
(calculated sexgroupweight * calculated agegroupweight) as weightgroupT,
(calculated weightgroupT / calculated weightgroupA) as weight
from house3
group by weightgroup; 

data house5; set house4; 
call streaminit(123); 
random = rand('Uniform'); 
randombyweight = (random * weight); 

proc sql; create table house6
as select
from house5
group by house
having randombyweight = max(randombyweight); 




Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg



Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 2 in conversation