I am a relatively new SAS user working with a very large dataset that emphasizes the relationships between people living together. As a result, the data for each person (observation) include variables listing the person-level id numbers for everyone else who lives with them. Below is code that can illustrate what the data look like. In this example, there are two households - one with 7 people and the other with only 2 people.
data example;
input hhid personid age sex $ person1-person7;
datalines;
38675 101 40 M 101 102 103 104 201 202 301
38675 102 38 F 101 102 103 104 201 202 301
38675 103 15 M 101 102 103 104 201 202 301
38675 104 12 M 101 102 103 104 201 202 301
38675 201 35 F 101 102 103 104 201 202 301
38675 202 10 M 101 102 103 104 201 202 301
38675 301 70 F 101 102 103 104 201 202 301
38679 101 25 F 101 102 . . . . .
38679 102 24 M 101 102 . . . . .
;
run;
I want to create variables that list the ages for each of the other people living in the household - see below for an illustration (I leave out some person columns for simplicity). Note that the variable creation has to be done by hhid because the personid's are not unique, but instead repeated across households.
Obs | hhid | personid | age | sex | person1 | person2 | person3 | person1_age | person2_age | person3_age |
1 | 38675 | 101 | 40 | M | 101 | 102 | 103 | 40 | 38 | 15 |
2 | 38675 | 102 | 38 | F | 101 | 102 | 103 | 40 | 38 | 15 |
3 | 38675 | 103 | 15 | M | 101 | 102 | 103 | 40 | 38 | 15 |
4 | 38675 | 104 | 12 | M | 101 | 102 | 103 | 40 | 38 | 15 |
5 | 38675 | 201 | 35 | F | 101 | 102 | 103 | 40 | 38 | 15 |
6 | 38675 | 202 | 10 | M | 101 | 102 | 103 | 40 | 38 | 15 |
7 | 38675 | 301 | 70 | F | 101 | 102 | 103 | 40 | 38 | 15 |
8 | 38679 | 101 | 25 | F | 101 | 102 | . | 25 | 24 | . |
9 | 38679 | 102 | 24 | M | 101 | 102 | . | 25 | 24 | . |
Any suggestions?
Here is one way that does what you ask with some details to consider:
data example; input hhid personid age sex $ person1-person7; datalines; 38675 101 40 M 101 102 103 104 201 202 301 38675 102 38 F 101 102 103 104 201 202 301 38675 103 15 M 101 102 103 104 201 202 301 38675 104 12 M 101 102 103 104 201 202 301 38675 201 35 F 101 102 103 104 201 202 301 38675 202 10 M 101 102 103 104 201 202 301 38675 301 70 F 101 102 103 104 201 202 301 38679 101 25 F 101 102 . . . . . 38679 102 24 M 101 102 . . . . . ; run; proc sort data=example; by hhid personid; run; proc transpose data=example out=trans (drop=_name_) prefix=person_age; by hhid; var age; run; data want; merge example trans; by hhid; run;
Instead of naming variables with with person2_age I use person_age2. That way you can reference all of the ages with list short cuts like: person_age1 - person_age7 or person_age: . I also intentionaly create all 7 instead of trying to "name around" the original age.
You will appreciate the different naming convention in the long run believe me.
I would actually be tempted to drop the original age variable at the merge but that is up to you.
The proc sort step is needed if your data is not sorted at some point.
Here is one way that does what you ask with some details to consider:
data example; input hhid personid age sex $ person1-person7; datalines; 38675 101 40 M 101 102 103 104 201 202 301 38675 102 38 F 101 102 103 104 201 202 301 38675 103 15 M 101 102 103 104 201 202 301 38675 104 12 M 101 102 103 104 201 202 301 38675 201 35 F 101 102 103 104 201 202 301 38675 202 10 M 101 102 103 104 201 202 301 38675 301 70 F 101 102 103 104 201 202 301 38679 101 25 F 101 102 . . . . . 38679 102 24 M 101 102 . . . . . ; run; proc sort data=example; by hhid personid; run; proc transpose data=example out=trans (drop=_name_) prefix=person_age; by hhid; var age; run; data want; merge example trans; by hhid; run;
Instead of naming variables with with person2_age I use person_age2. That way you can reference all of the ages with list short cuts like: person_age1 - person_age7 or person_age: . I also intentionaly create all 7 instead of trying to "name around" the original age.
You will appreciate the different naming convention in the long run believe me.
I would actually be tempted to drop the original age variable at the merge but that is up to you.
The proc sort step is needed if your data is not sorted at some point.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.