BookmarkSubscribeRSS Feed
rg25
Calcite | Level 5

The dataset I received contains data from several household members on a single line, for instance each member's age will be under age_1, age_2, etc. All members of the same family are linked together by a household ID. Is there any way to separate this single observation (containing all of the family members), into separate observations for each member?

6 REPLIES 6
Sonywell
Fluorite | Level 6

You could potentially perform a transpose depending on your data, or you could use a data step with multiple output statements:

data output(keep=householdID, age);

     set input;

     age = age_1;  output;

     age = age_2;  output;

     age = age_3;  output;

     ...

run;

The output statements forces an observation to be written to the output set each time it is encountered.  Just one idea Smiley Happy

ballardw
Super User

One would hope there are variables for the member id as well. They and any other specific individual data should be all split out at that time. Just extending the idea Sonywell posted.

data output(drop= age_1-age_3 personid_1 - personid_3;); /* drop the repeated variables will keep all of the other variables*/

     set input;

     age = age_1;  personId = personid_1; output;

     /* and to avoid creating blank records in the case where there are not more folks*/

     if not missing (age_2) then do; age = age_2;  personId = personid_2; output; end;

     if not missing (age_3) then do;age = age_3;  personId = personid_2; output; end;

     ...

run;

If you have a larger number of potential additional household members then the approach would be better using array processing. Suppose your data has up to 8 household members:

data output(drop= age_1-age_8 personid_1 - personid_8;); /* drop the repeated variables will keep all of the other variables*/

     set input;

      array a age_1 - age_8;

     array id personid_1 - personid_8;

     <continue to define additional arrays for additional person variables*/

     do j=1 to dim(a);   /* make sure that all of the arrays have the same number of elements for this to work*/

     /* and to avoid creating blank records in the case where there are not more folks*/

          if not missing (a) then do;

               age = age;

               personId = id;

               <other array processes go here>

               output;

          end;/* the IF*/

     end; /* the do j= loop*/

    ;

run;

rg25
Calcite | Level 5

unfortunately there is no specific person ID.

when I run this code:

data hh_split_2(drop= age_1-age_8 edulvl_1-edulvl_8 sex_1-sex_8 res_rdp_1-res_rdp_8 pregnnt_1-pregnnt_8);

     set hh_split;

     array a age_1 - age_8;

     array s sex_1 - sex_8;

     array r res_rdp_1-res_rdp_8;

  array p pregnnt_1-pregnnt_8;

     do j=1 to dim(a);

  do j=1 to dim(s);

  do j=1 to dim(r);

  do j=1 to dim(p);

if not missing (a) then do;

               age = a;

  output; end;

if not missing (s) then do;

               sex= s;

  output; end;

if not missing (r) then do;

  res_rdp= r; output; end;

if not missing (p) then do;

  pregnnt=p; output; end;

run;

I get the following error: ERROR 117-185: There were 4 unclosed DO blocks.

Ksharp
Super User

Post your sample data. I think double transpose can do that.

rg25
Calcite | Level 5
INT_NAMEHH_AreaHH_ADDRREF_PTHHIDHH_NAME_1PERS_1AGE_YRS_1SEX_1DISABLE_1RES_RDP_1PREGNNT_1EDU_LVL_1EMPLYMNT_1SELECTED_1HH_NAME_2PERS_2SEX_2AGE_YRS_2DISABLE_2RES_RDP_2PREGNNT_2EDU_LVL_2EMPLYMNT_2SELECTED_2HH_NAME_3PERS_3SEX_3AGE_YRS_3DISABLE_3RES_RDP_3PREGNNT_3EDU_LVL_3EMPLYMNT_3SELECTED_3HH_NAME_4PERS_4SEX_4AGE_YRS_4DISABLE_4RES_RDP_4PREGNNT_4EDU_LVL_4EMPLYMNT_4SELECTED_4HH_NAME_5PERS_5AGE_YRS_5SEX_5DISABLE_5RES_RDP_5PREGNNT_5EDU_LVL_5EMPLYMNT_5SELECTED_5HH_NAME_6PERS_6SEX_6AGE_YRS_6DISABLE_6RES_RDP_6PREGNNT_6EDU_LVL_6EMPLYMNT_6SELECTED_6HH_NAME_7PERS_7SEX_7AGE_YRS_7DISABLE_7RES_RDP_7PREGNNT_7EDU_LVL_7EMPLYMNT_7SELECTED_7HH_NAME_8PERS_8SEX_8AGE_YRS_8DISABLE_8RES_RDP_8PREGNNT_8EDU_LVL_8EMPLYMNT_8SELECTED_8HH_NAME_9PERS_9SEX_9AGE_YRS_9DISABLE_9RES_RDP_9PREGNNT_9EDU_LVL_9EMPLYMNT_9SELECTED_9HH_NAME_10PERS_10AGE_YRS_10SEX_10DISABLE_10RES_RDP_10PREGNNT_10EDU_LVL_10EMPLYMNT_10SELECTED_10HH_NAME_11PERS_11AGE_YRS_11SEX_11DISABLE_11RES_RDP_11PREGNNT_11EDU_LVL_11EMPLYMNT_11SELECTED_11HH_NAME_12PERS_12AGE_YRS_12SEX_12DISABLE_12RES_RDP_12PREGNNT_12EDU_LVL_12EMPLYMNT_12SELECTED_12HH_NAME_13PERS_13AGE_YRS_13SEX_13DISABLE_13RES_RDP_13PREGNNT_13EDU_LVL_13EMPLYMNT_13SELECTED_13HH_NAME_14PERS_14AGE_YRS_14SEX_14DISABLE_14RES_RDP_14PREGNNT_14EDU_LVL_14EMPLYMNT_14SELECTED_14

This is how the variables are listed in the dataset, all the information before HH_NAME_1 is the same for all members of the household. The variables after HHID (household ID) are specific to each household member (depending on the ending 1, 2, 3...)

Ksharp
Super User

You didn't post some DATA yet .

Code: Program

data have;
input INT_NAME HH_Area HH_ADDR REF_PT HHID HH_NAME_1 SEX_1 HH_NAME_2 SEX_2;
n+1;
cards;
1 1 1 1 1 11 0 12 1
1 1 1 1 1 22 0 22 1
1 1 1 1 2 11 0 12 1
1 1 1 1 2 22 0 22 1
1 1 1 1 2 33 0 32 1
;
run;
proc transpose data=have out=temp;
by INT_NAME HH_Area HH_ADDR REF_PT HHID n;
var HH_NAME_: SEX_: ;
run;
data temp;
set temp;
id=input(scan(_name_,-1,'_'),best32.);
name=prxchange('s/_\d+$//',1,strip(_name_));
run;
proc sort data=temp(drop=_name_) out=x ;
by INT_NAME HH_Area HH_ADDR REF_PT HHID id;
run;
data x;
set x;
by INT_NAME HH_Area HH_ADDR REF_PT HHID n notsorted;
group+first.n;
run;

proc transpose data=x out=want;
by INT_NAME HH_Area HH_ADDR REF_PT HHID group;
var col1;
id name;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1882 views
  • 0 likes
  • 4 in conversation