BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
UPRETIGOPI
Obsidian | Level 7

I need to subset dataset for 100 households from a large SAS dataset to prepare input for execution of test cases. 
In order to identify a household on data file,  A household in datafile can be identified as all members of the household will
share the same SERIALNO. All members of each selected household must be included in the subset data.  There are 50 variables in the dataset and ten thousands SERIALNO but I needed to subset the dataset based on two variable SERIALNO and Household_member. Each SERIALNO represents one household and also Household_member. I just needed to create a subset of 100 households (SERIALNO) with Household_member included in it with rest of the variables in the dataset. 

In the example below, SERIALNO 20161 has household_number 1, 2, 3 and SERIALNO 20162 has 1 household_member and SERIALNO 20164 has household_number 1, 2, 3 and so on and some household_member are up to 15. 
Ho do I subset of 100 households with SERIALNO that includes household_members as described below? Please help with the SAS program code to subset this dataset

SERIALNO        Household_member

20161                 1
20161                 2
20161                 3

20162                 1 
20164                 1 
20164                 2 
20164                 3 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @UPRETIGOPI,

 

I think a random sample using households as sampling units matches your description, except that only variable SERIALNO, but not Household_member, would play a special role in the sampling process.

proc surveyselect data=have
method=srs n=100 seed=2718 out=want;
cluster serialno;
run;

View solution in original post

1 REPLY 1
FreelanceReinh
Jade | Level 19

Hello @UPRETIGOPI,

 

I think a random sample using households as sampling units matches your description, except that only variable SERIALNO, but not Household_member, would play a special role in the sampling process.

proc surveyselect data=have
method=srs n=100 seed=2718 out=want;
cluster serialno;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 694 views
  • 0 likes
  • 2 in conversation