Solved: Random draw out of a household

lizzy28 · Posted 12-15-2019 11:27 PM

Hi everyone,

I have a dataset with two identifiers: one (person_id) for each person, and the other (house_id) for each household. Multiple members with different person_ids in a household share one house_id. Is there a quick way to draw randomly one member out of each household?

Below is my sample data:

person_id	house_id	Gender	Category
12345611	123456	F	1
12345612	123456	M	2
12345613	123456	M	2
23456711	234567	M	1
23456712	234567	F	3
45678911	456789	M	2
45678912	456789	F	2
65432111	654321	M	3
65432112	654321	F	1

The target dataset could be:

person_id	house_id	Gender	Category
12345611	123456	F	1
23456712	234567	F	3
45678911	456789	M	2
65432112	654321	F	1

Thank you!

Lizi

ed_sas_member · Posted 12-16-2019 03:58 AM

Hi @lizzy28

You can also use PROC SURVEYSELECT as follows:

data have;
	infile datalines dlm="09"x;
	input person_id	house_id Gender $ Category;
	datalines;
12345611	123456	F	1
12345612	123456	M	2
12345613	123456	M	2
23456711	234567	M	1
23456712	234567	F	3
45678911	456789	M	2
45678912	456789	F	2
65432111	654321	M	3
65432112	654321	F	1
;
run;

proc surveyselect data=have method=srs n=1 out=want (drop=selectionprob samplingweight) noprint;
	strata house_id;
run;

The STRATA statement names the stratification variables HOUSE_ID. In the PROC SURVEYSELECT statement, the METHOD=SRS option specifies simple random sampling. The N=1 option specifies a sample size of 1 observation for each stratum.

View solution in original post

mkeintz · Posted 12-16-2019 12:35 AM

If your data are sorted by house_id, then you can:

Read and count consecutive person records in one household setting _N_PERS to the count.
Generate a random integer _RAND_DRAW between 1 and _N_PERS
Re-read the same household and output the record whose sequence matches _RAND_DRAW
Do the same sequence for the next household

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

KachiM · Posted 12-16-2019 12:49 AM

Hi @lizzy28 ,

The code-implemetation of @mkeintz is given below:

data want;
   call streaminit(123);
   do count = 1 by 1 until(last.house_id);
      set household;
      by house_id;
   end;
   rnd = ceil(rand('UNIFORM') * count);
   do count = 1 by 1 until(last.house_id);
      set household;
      by house_id;
      if rnd = count then output; 
    end;
drop count rnd ;
run;

lizzy28 · Posted 02-06-2020 11:20 AM

Thank you!

I tested the data. The only issue is that the outcome data turn out to have 'M' gender only. I believe it is related to how the data is sorted.

ed_sas_member · Posted 12-16-2019 03:58 AM

Hi @lizzy28

You can also use PROC SURVEYSELECT as follows:

data have;
	infile datalines dlm="09"x;
	input person_id	house_id Gender $ Category;
	datalines;
12345611	123456	F	1
12345612	123456	M	2
12345613	123456	M	2
23456711	234567	M	1
23456712	234567	F	3
45678911	456789	M	2
45678912	456789	F	2
65432111	654321	M	3
65432112	654321	F	1
;
run;

proc surveyselect data=have method=srs n=1 out=want (drop=selectionprob samplingweight) noprint;
	strata house_id;
run;

The STRATA statement names the stratification variables HOUSE_ID. In the PROC SURVEYSELECT statement, the METHOD=SRS option specifies simple random sampling. The N=1 option specifies a sample size of 1 observation for each stratum.

Random draw out of a household

Re: Random draw out of a household

Re: Random draw out of a household

Re: Random draw out of a household

Re: Random draw out of a household

Re: Random draw out of a household