BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
palwonk
Calcite | Level 5

I am a relatively new SAS user working with a very large dataset that emphasizes the relationships between people living together. As a result, the data for each person (observation) include variables listing the person-level id numbers for everyone else who lives with them. Below is code that can illustrate what the data look like. In this example, there are two households - one with 7 people and the other with only 2 people.

 

data example;
	input hhid personid age sex $ person1-person7;
	datalines;
38675 101 40 M 101 102 103 104 201 202 301
38675 102 38 F 101 102 103 104 201 202 301
38675 103 15 M 101 102 103 104 201 202 301
38675 104 12 M 101 102 103 104 201 202 301
38675 201 35 F 101 102 103 104 201 202 301
38675 202 10 M 101 102 103 104 201 202 301
38675 301 70 F 101 102 103 104 201 202 301
38679 101 25 F 101 102 . . . . .
38679 102 24 M 101 102 . . . . .
	;
run; 

I want to create variables that list the ages for each of the other people living in the household - see below for an illustration (I leave out some person columns for simplicity). Note that the variable creation has to be done by hhid because the personid's are not unique, but instead repeated across households.

 

Obshhidpersonidagesexperson1person2person3person1_ageperson2_ageperson3_age
13867510140M101102103403815
23867510238F101102103403815
33867510315M101102103403815
43867510412M101102103403815
53867520135F101102103403815
63867520210M101102103403815
73867530170F101102103403815
83867910125F101102.2524.
93867910224M101102.2524.

 

Any suggestions?

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here is one way that does what you ask with some details to consider:

data example;
	input hhid personid age sex $ person1-person7;
	datalines;
38675 101 40 M 101 102 103 104 201 202 301
38675 102 38 F 101 102 103 104 201 202 301
38675 103 15 M 101 102 103 104 201 202 301
38675 104 12 M 101 102 103 104 201 202 301
38675 201 35 F 101 102 103 104 201 202 301
38675 202 10 M 101 102 103 104 201 202 301
38675 301 70 F 101 102 103 104 201 202 301
38679 101 25 F 101 102 . . . . .
38679 102 24 M 101 102 . . . . .
	;
run; 

proc sort data=example;
   by hhid personid;
run;

proc transpose data=example out=trans (drop=_name_)
  prefix=person_age;
by hhid;
var age;
run;

data want;
   merge example trans;
   by hhid;
run;

Instead of naming variables with with person2_age I use person_age2. That way you can reference all of the ages with list short cuts like: person_age1 - person_age7 or person_age: . I also intentionaly create all 7 instead of trying to "name around" the original age.

You will appreciate the different naming convention in the long run believe me.

 

I would actually be tempted to drop the original age variable at the merge but that is up to you.

 

The proc sort step is needed if your data is not sorted at some point.

View solution in original post

2 REPLIES 2
ballardw
Super User

Here is one way that does what you ask with some details to consider:

data example;
	input hhid personid age sex $ person1-person7;
	datalines;
38675 101 40 M 101 102 103 104 201 202 301
38675 102 38 F 101 102 103 104 201 202 301
38675 103 15 M 101 102 103 104 201 202 301
38675 104 12 M 101 102 103 104 201 202 301
38675 201 35 F 101 102 103 104 201 202 301
38675 202 10 M 101 102 103 104 201 202 301
38675 301 70 F 101 102 103 104 201 202 301
38679 101 25 F 101 102 . . . . .
38679 102 24 M 101 102 . . . . .
	;
run; 

proc sort data=example;
   by hhid personid;
run;

proc transpose data=example out=trans (drop=_name_)
  prefix=person_age;
by hhid;
var age;
run;

data want;
   merge example trans;
   by hhid;
run;

Instead of naming variables with with person2_age I use person_age2. That way you can reference all of the ages with list short cuts like: person_age1 - person_age7 or person_age: . I also intentionaly create all 7 instead of trying to "name around" the original age.

You will appreciate the different naming convention in the long run believe me.

 

I would actually be tempted to drop the original age variable at the merge but that is up to you.

 

The proc sort step is needed if your data is not sorted at some point.

palwonk
Calcite | Level 5
Your solution worked like a charm - thank you! (And good pointer about variable naming convention.)

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 374 views
  • 1 like
  • 2 in conversation