BookmarkSubscribeRSS Feed
rg25
Calcite | Level 5

The dataset I received contains data from several household members on a single line, for instance each member's age will be under age_1, age_2, etc. All members of the same family are linked together by a household ID. Is there any way to separate this single observation (containing all of the family members), into separate observations for each member?

6 REPLIES 6
Sonywell
Fluorite | Level 6

You could potentially perform a transpose depending on your data, or you could use a data step with multiple output statements:

data output(keep=householdID, age);

     set input;

     age = age_1;  output;

     age = age_2;  output;

     age = age_3;  output;

     ...

run;

The output statements forces an observation to be written to the output set each time it is encountered.  Just one idea Smiley Happy

ballardw
Super User

One would hope there are variables for the member id as well. They and any other specific individual data should be all split out at that time. Just extending the idea Sonywell posted.

data output(drop= age_1-age_3 personid_1 - personid_3;); /* drop the repeated variables will keep all of the other variables*/

     set input;

     age = age_1;  personId = personid_1; output;

     /* and to avoid creating blank records in the case where there are not more folks*/

     if not missing (age_2) then do; age = age_2;  personId = personid_2; output; end;

     if not missing (age_3) then do;age = age_3;  personId = personid_2; output; end;

     ...

run;

If you have a larger number of potential additional household members then the approach would be better using array processing. Suppose your data has up to 8 household members:

data output(drop= age_1-age_8 personid_1 - personid_8;); /* drop the repeated variables will keep all of the other variables*/

     set input;

      array a age_1 - age_8;

     array id personid_1 - personid_8;

     <continue to define additional arrays for additional person variables*/

     do j=1 to dim(a);   /* make sure that all of the arrays have the same number of elements for this to work*/

     /* and to avoid creating blank records in the case where there are not more folks*/

          if not missing (a) then do;

               age = age;

               personId = id;

               <other array processes go here>

               output;

          end;/* the IF*/

     end; /* the do j= loop*/

    ;

run;

rg25
Calcite | Level 5

unfortunately there is no specific person ID.

when I run this code:

data hh_split_2(drop= age_1-age_8 edulvl_1-edulvl_8 sex_1-sex_8 res_rdp_1-res_rdp_8 pregnnt_1-pregnnt_8);

     set hh_split;

     array a age_1 - age_8;

     array s sex_1 - sex_8;

     array r res_rdp_1-res_rdp_8;

  array p pregnnt_1-pregnnt_8;

     do j=1 to dim(a);

  do j=1 to dim(s);

  do j=1 to dim(r);

  do j=1 to dim(p);

if not missing (a) then do;

               age = a;

  output; end;

if not missing (s) then do;

               sex= s;

  output; end;

if not missing (r) then do;

  res_rdp= r; output; end;

if not missing (p) then do;

  pregnnt=p; output; end;

run;

I get the following error: ERROR 117-185: There were 4 unclosed DO blocks.

Ksharp
Super User

Post your sample data. I think double transpose can do that.

rg25
Calcite | Level 5
INT_NAMEHH_AreaHH_ADDRREF_PTHHIDHH_NAME_1PERS_1AGE_YRS_1SEX_1DISABLE_1RES_RDP_1PREGNNT_1EDU_LVL_1EMPLYMNT_1SELECTED_1HH_NAME_2PERS_2SEX_2AGE_YRS_2DISABLE_2RES_RDP_2PREGNNT_2EDU_LVL_2EMPLYMNT_2SELECTED_2HH_NAME_3PERS_3SEX_3AGE_YRS_3DISABLE_3RES_RDP_3PREGNNT_3EDU_LVL_3EMPLYMNT_3SELECTED_3HH_NAME_4PERS_4SEX_4AGE_YRS_4DISABLE_4RES_RDP_4PREGNNT_4EDU_LVL_4EMPLYMNT_4SELECTED_4HH_NAME_5PERS_5AGE_YRS_5SEX_5DISABLE_5RES_RDP_5PREGNNT_5EDU_LVL_5EMPLYMNT_5SELECTED_5HH_NAME_6PERS_6SEX_6AGE_YRS_6DISABLE_6RES_RDP_6PREGNNT_6EDU_LVL_6EMPLYMNT_6SELECTED_6HH_NAME_7PERS_7SEX_7AGE_YRS_7DISABLE_7RES_RDP_7PREGNNT_7EDU_LVL_7EMPLYMNT_7SELECTED_7HH_NAME_8PERS_8SEX_8AGE_YRS_8DISABLE_8RES_RDP_8PREGNNT_8EDU_LVL_8EMPLYMNT_8SELECTED_8HH_NAME_9PERS_9SEX_9AGE_YRS_9DISABLE_9RES_RDP_9PREGNNT_9EDU_LVL_9EMPLYMNT_9SELECTED_9HH_NAME_10PERS_10AGE_YRS_10SEX_10DISABLE_10RES_RDP_10PREGNNT_10EDU_LVL_10EMPLYMNT_10SELECTED_10HH_NAME_11PERS_11AGE_YRS_11SEX_11DISABLE_11RES_RDP_11PREGNNT_11EDU_LVL_11EMPLYMNT_11SELECTED_11HH_NAME_12PERS_12AGE_YRS_12SEX_12DISABLE_12RES_RDP_12PREGNNT_12EDU_LVL_12EMPLYMNT_12SELECTED_12HH_NAME_13PERS_13AGE_YRS_13SEX_13DISABLE_13RES_RDP_13PREGNNT_13EDU_LVL_13EMPLYMNT_13SELECTED_13HH_NAME_14PERS_14AGE_YRS_14SEX_14DISABLE_14RES_RDP_14PREGNNT_14EDU_LVL_14EMPLYMNT_14SELECTED_14

This is how the variables are listed in the dataset, all the information before HH_NAME_1 is the same for all members of the household. The variables after HHID (household ID) are specific to each household member (depending on the ending 1, 2, 3...)

Ksharp
Super User

You didn't post some DATA yet .

Code: Program

data have;
input INT_NAME HH_Area HH_ADDR REF_PT HHID HH_NAME_1 SEX_1 HH_NAME_2 SEX_2;
n+1;
cards;
1 1 1 1 1 11 0 12 1
1 1 1 1 1 22 0 22 1
1 1 1 1 2 11 0 12 1
1 1 1 1 2 22 0 22 1
1 1 1 1 2 33 0 32 1
;
run;
proc transpose data=have out=temp;
by INT_NAME HH_Area HH_ADDR REF_PT HHID n;
var HH_NAME_: SEX_: ;
run;
data temp;
set temp;
id=input(scan(_name_,-1,'_'),best32.);
name=prxchange('s/_\d+$//',1,strip(_name_));
run;
proc sort data=temp(drop=_name_) out=x ;
by INT_NAME HH_Area HH_ADDR REF_PT HHID id;
run;
data x;
set x;
by INT_NAME HH_Area HH_ADDR REF_PT HHID n notsorted;
group+first.n;
run;

proc transpose data=x out=want;
by INT_NAME HH_Area HH_ADDR REF_PT HHID group;
var col1;
id name;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2243 views
  • 0 likes
  • 4 in conversation