Solved: Re: How do I create variables using data from other observations, by g...

palwonk · Posted 09-11-2019 06:20 PM

I am a relatively new SAS user working with a very large dataset that emphasizes the relationships between people living together. As a result, the data for each person (observation) include variables listing the person-level id numbers for everyone else who lives with them. Below is code that can illustrate what the data look like. In this example, there are two households - one with 7 people and the other with only 2 people.

data example;
	input hhid personid age sex $ person1-person7;
	datalines;
38675 101 40 M 101 102 103 104 201 202 301
38675 102 38 F 101 102 103 104 201 202 301
38675 103 15 M 101 102 103 104 201 202 301
38675 104 12 M 101 102 103 104 201 202 301
38675 201 35 F 101 102 103 104 201 202 301
38675 202 10 M 101 102 103 104 201 202 301
38675 301 70 F 101 102 103 104 201 202 301
38679 101 25 F 101 102 . . . . .
38679 102 24 M 101 102 . . . . .
	;
run;

I want to create variables that list the ages for each of the other people living in the household - see below for an illustration (I leave out some person columns for simplicity). Note that the variable creation has to be done by hhid because the personid's are not unique, but instead repeated across households.

Obs	hhid	personid	age	sex	person1	person2	person3	person1_age	person2_age	person3_age
1	38675	101	40	M	101	102	103	40	38	15
2	38675	102	38	F	101	102	103	40	38	15
3	38675	103	15	M	101	102	103	40	38	15
4	38675	104	12	M	101	102	103	40	38	15
5	38675	201	35	F	101	102	103	40	38	15
6	38675	202	10	M	101	102	103	40	38	15
7	38675	301	70	F	101	102	103	40	38	15
8	38679	101	25	F	101	102	.	25	24	.
9	38679	102	24	M	101	102	.	25	24	.

Any suggestions?

ballardw · Posted 09-12-2019 11:15 AM

Here is one way that does what you ask with some details to consider:

data example;
	input hhid personid age sex $ person1-person7;
	datalines;
38675 101 40 M 101 102 103 104 201 202 301
38675 102 38 F 101 102 103 104 201 202 301
38675 103 15 M 101 102 103 104 201 202 301
38675 104 12 M 101 102 103 104 201 202 301
38675 201 35 F 101 102 103 104 201 202 301
38675 202 10 M 101 102 103 104 201 202 301
38675 301 70 F 101 102 103 104 201 202 301
38679 101 25 F 101 102 . . . . .
38679 102 24 M 101 102 . . . . .
	;
run; 

proc sort data=example;
   by hhid personid;
run;

proc transpose data=example out=trans (drop=_name_)
  prefix=person_age;
by hhid;
var age;
run;

data want;
   merge example trans;
   by hhid;
run;

Instead of naming variables with with person2_age I use person_age2. That way you can reference all of the ages with list short cuts like: person_age1 - person_age7 or person_age: . I also intentionaly create all 7 instead of trying to "name around" the original age.

You will appreciate the different naming convention in the long run believe me.

I would actually be tempted to drop the original age variable at the merge but that is up to you.

The proc sort step is needed if your data is not sorted at some point.

View solution in original post

ballardw · Posted 09-12-2019 11:15 AM

Here is one way that does what you ask with some details to consider:

data example;
	input hhid personid age sex $ person1-person7;
	datalines;
38675 101 40 M 101 102 103 104 201 202 301
38675 102 38 F 101 102 103 104 201 202 301
38675 103 15 M 101 102 103 104 201 202 301
38675 104 12 M 101 102 103 104 201 202 301
38675 201 35 F 101 102 103 104 201 202 301
38675 202 10 M 101 102 103 104 201 202 301
38675 301 70 F 101 102 103 104 201 202 301
38679 101 25 F 101 102 . . . . .
38679 102 24 M 101 102 . . . . .
	;
run; 

proc sort data=example;
   by hhid personid;
run;

proc transpose data=example out=trans (drop=_name_)
  prefix=person_age;
by hhid;
var age;
run;

data want;
   merge example trans;
   by hhid;
run;

Instead of naming variables with with person2_age I use person_age2. That way you can reference all of the ages with list short cuts like: person_age1 - person_age7 or person_age: . I also intentionaly create all 7 instead of trying to "name around" the original age.

You will appreciate the different naming convention in the long run believe me.

I would actually be tempted to drop the original age variable at the merge but that is up to you.

The proc sort step is needed if your data is not sorted at some point.

palwonk · Posted 09-12-2019 11:46 AM

Your solution worked like a charm - thank you! (And good pointer about variable naming convention.)

How do I create variables using data from other observations, by group

Re: How do I create variables using data from other observations, by group

Re: How do I create variables using data from other observations, by group

Re: How do I create variables using data from other observations, by group

How do I create variables using data from other observations, by group

Re: How do I create variables using data from other observations, by group

Re: How do I create variables using data from other observations, by group

Re: How do I create variables using data from other observations, by group

Registration is open

SAS Training: Just a Click Away