I would like to create a variable called 'household composition' using survey data. This is based on four indicators
1) household ID (HID)
2) father ID (FID)
3) mother ID (MID)
4) spouse ID (SID)
Here is an example of the data. PID is participant ID.
PID HID FID MID SID
101 1 . . 102
102 1 . . 101
103 1 101 102 .
201 2 . . .
202 2 201 . .
301 3 . . 302
302 3 . . 301
401 4 . . .
501 5 . . 502
502 5 . . 501
I would like to say:
- the household composition of household 1 (hid=1) is a two parent family [mom is 102, dad is 101, child is 103]
- the household composition of household 2 (hid=2) is a single parent family (dad is 201, kid is 202]
- the household composition of household 3 (hid=3) is a couple (301 and 302 are together, no kids). household 5 are also a couple.
- the household composition of household 4 (hid=4) is single person (id=401)
How do I do this and keep data in a long format?
Thanks!!!
Will you need this code to be applied to all records of each household or only one? If the former there will be a step to summarize the data and a second to merge it back to the original.
What should your composition variable look like? A single character or digit? Do you need to distinguish between single parent families with only a father and only a mother or just single parent?
It appears that father ids always end in 1, mother ids with 2 and anything else is a child. Is that correct?
And I don't want to use survey software that outputs data that way if the household is a single response to a survey...
The code needs to be applied to all individuals in the house, not just the household itself.
The composition variable will be a digit; it does not matter if a single parent household is headed by a father or a mother, just that it's a single parent household.
Unfortunately, fathers' codes do not always end in 01.
So how to do you tell that a pid is a parent not a child?
Is the data sorted this way?
Yes - the data are sorted as i wrote them. thanks!
You'll have to merge this back in with the original data.
data want;
length composition $20.;
set have;
by hid notsorted;
retain flag_child flag_married;
if first.hid then do;
call missing (flag_child, flag_married);
end;
if (not missing(fid)) or (not missing(mid)) then flag_child=1;
if not missing(sid) then flag_married=1;
if first.hid and last.hid then composition='Single Person';
else if last.hid and flag_child=1 and flag_married=. then composition='Single Parent';
else if last.hid and flag_child=1 and flag_married=1 then composition='Two Parent Family';
else if last.hid and flag_child=. and flag_married=1 then composition='Couple';
if last.hid then output;
*keep hid composition;
run;
What is your final output ?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.