🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Calcite | Level 5

## How do I create variables using data from other observations, by group

I am a relatively new SAS user working with a very large dataset that emphasizes the relationships between people living together. As a result, the data for each person (observation) include variables listing the person-level id numbers for everyone else who lives with them. Below is code that can illustrate what the data look like. In this example, there are two households - one with 7 people and the other with only 2 people.

data example;
input hhid personid age sex \$ person1-person7;
datalines;
38675 101 40 M 101 102 103 104 201 202 301
38675 102 38 F 101 102 103 104 201 202 301
38675 103 15 M 101 102 103 104 201 202 301
38675 104 12 M 101 102 103 104 201 202 301
38675 201 35 F 101 102 103 104 201 202 301
38675 202 10 M 101 102 103 104 201 202 301
38675 301 70 F 101 102 103 104 201 202 301
38679 101 25 F 101 102 . . . . .
38679 102 24 M 101 102 . . . . .
;
run;

I want to create variables that list the ages for each of the other people living in the household - see below for an illustration (I leave out some person columns for simplicity). Note that the variable creation has to be done by hhid because the personid's are not unique, but instead repeated across households.

 Obs hhid personid age sex person1 person2 person3 person1_age person2_age person3_age 1 38675 101 40 M 101 102 103 40 38 15 2 38675 102 38 F 101 102 103 40 38 15 3 38675 103 15 M 101 102 103 40 38 15 4 38675 104 12 M 101 102 103 40 38 15 5 38675 201 35 F 101 102 103 40 38 15 6 38675 202 10 M 101 102 103 40 38 15 7 38675 301 70 F 101 102 103 40 38 15 8 38679 101 25 F 101 102 . 25 24 . 9 38679 102 24 M 101 102 . 25 24 .

Any suggestions?

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: How do I create variables using data from other observations, by group

Here is one way that does what you ask with some details to consider:

data example;
input hhid personid age sex \$ person1-person7;
datalines;
38675 101 40 M 101 102 103 104 201 202 301
38675 102 38 F 101 102 103 104 201 202 301
38675 103 15 M 101 102 103 104 201 202 301
38675 104 12 M 101 102 103 104 201 202 301
38675 201 35 F 101 102 103 104 201 202 301
38675 202 10 M 101 102 103 104 201 202 301
38675 301 70 F 101 102 103 104 201 202 301
38679 101 25 F 101 102 . . . . .
38679 102 24 M 101 102 . . . . .
;
run;

proc sort data=example;
by hhid personid;
run;

proc transpose data=example out=trans (drop=_name_)
prefix=person_age;
by hhid;
var age;
run;

data want;
merge example trans;
by hhid;
run;

Instead of naming variables with with person2_age I use person_age2. That way you can reference all of the ages with list short cuts like: person_age1 - person_age7 or person_age: . I also intentionaly create all 7 instead of trying to "name around" the original age.

You will appreciate the different naming convention in the long run believe me.

I would actually be tempted to drop the original age variable at the merge but that is up to you.

The proc sort step is needed if your data is not sorted at some point.

2 REPLIES 2
Super User

## Re: How do I create variables using data from other observations, by group

Here is one way that does what you ask with some details to consider:

data example;
input hhid personid age sex \$ person1-person7;
datalines;
38675 101 40 M 101 102 103 104 201 202 301
38675 102 38 F 101 102 103 104 201 202 301
38675 103 15 M 101 102 103 104 201 202 301
38675 104 12 M 101 102 103 104 201 202 301
38675 201 35 F 101 102 103 104 201 202 301
38675 202 10 M 101 102 103 104 201 202 301
38675 301 70 F 101 102 103 104 201 202 301
38679 101 25 F 101 102 . . . . .
38679 102 24 M 101 102 . . . . .
;
run;

proc sort data=example;
by hhid personid;
run;

proc transpose data=example out=trans (drop=_name_)
prefix=person_age;
by hhid;
var age;
run;

data want;
merge example trans;
by hhid;
run;

Instead of naming variables with with person2_age I use person_age2. That way you can reference all of the ages with list short cuts like: person_age1 - person_age7 or person_age: . I also intentionaly create all 7 instead of trying to "name around" the original age.

You will appreciate the different naming convention in the long run believe me.

I would actually be tempted to drop the original age variable at the merge but that is up to you.

The proc sort step is needed if your data is not sorted at some point.

Calcite | Level 5

## Re: How do I create variables using data from other observations, by group

Your solution worked like a charm - thank you! (And good pointer about variable naming convention.)
Discussion stats
• 2 replies
• 375 views
• 1 like
• 2 in conversation