Creating multiple observations from one

Reply
New Contributor
Posts: 3

Creating multiple observations from one

The dataset I received contains data from several household members on a single line, for instance each member's age will be under age_1, age_2, etc. All members of the same family are linked together by a household ID. Is there any way to separate this single observation (containing all of the family members), into separate observations for each member?

Occasional Contributor
Posts: 13

Re: Creating multiple observations from one

You could potentially perform a transpose depending on your data, or you could use a data step with multiple output statements:

data output(keep=householdID, age);

     set input;

     age = age_1;  output;

     age = age_2;  output;

     age = age_3;  output;

     ...

run;

The output statements forces an observation to be written to the output set each time it is encountered.  Just one idea Smiley Happy

Super User
Posts: 10,538

Re: Creating multiple observations from one

One would hope there are variables for the member id as well. They and any other specific individual data should be all split out at that time. Just extending the idea Sonywell posted.

data output(drop= age_1-age_3 personid_1 - personid_3Smiley Wink; /* drop the repeated variables will keep all of the other variables*/

     set input;

     age = age_1;  personId = personid_1; output;

     /* and to avoid creating blank records in the case where there are not more folks*/

     if not missing (age_2) then do; age = age_2;  personId = personid_2; output; end;

     if not missing (age_3) then do;age = age_3;  personId = personid_2; output; end;

     ...

run;

If you have a larger number of potential additional household members then the approach would be better using array processing. Suppose your data has up to 8 household members:

data output(drop= age_1-age_8 personid_1 - personid_8Smiley Wink; /* drop the repeated variables will keep all of the other variables*/

     set input;

      array a age_1 - age_8;

     array id personid_1 - personid_8;

     <continue to define additional arrays for additional person variables*/

     do j=1 to dim(a);   /* make sure that all of the arrays have the same number of elements for this to work*/

     /* and to avoid creating blank records in the case where there are not more folks*/

          if not missing (a) then do;

               age = age;

               personId = id;

               <other array processes go here>

               output;

          end;/* the IF*/

     end; /* the do j= loop*/

    ;

run;

New Contributor
Posts: 3

Re: Creating multiple observations from one

unfortunately there is no specific person ID.

when I run this code:

data hh_split_2(drop= age_1-age_8 edulvl_1-edulvl_8 sex_1-sex_8 res_rdp_1-res_rdp_8 pregnnt_1-pregnnt_8);

     set hh_split;

     array a age_1 - age_8;

     array s sex_1 - sex_8;

     array r res_rdp_1-res_rdp_8;

  array p pregnnt_1-pregnnt_8;

     do j=1 to dim(a);

  do j=1 to dim(s);

  do j=1 to dim(r);

  do j=1 to dim(p);

if not missing (a) then do;

               age = a;

  output; end;

if not missing (s) then do;

               sex= s;

  output; end;

if not missing (r) then do;

  res_rdp= r; output; end;

if not missing (p) then do;

  pregnnt=p; output; end;

run;

I get the following error: ERROR 117-185: There were 4 unclosed DO blocks.

Super User
Posts: 9,687

Re: Creating multiple observations from one

Post your sample data. I think double transpose can do that.

New Contributor
Posts: 3

Re: Creating multiple observations from one

INT_NAMEHH_AreaHH_ADDRREF_PTHHIDHH_NAME_1PERS_1AGE_YRS_1SEX_1DISABLE_1RES_RDP_1PREGNNT_1EDU_LVL_1EMPLYMNT_1SELECTED_1HH_NAME_2PERS_2SEX_2AGE_YRS_2DISABLE_2RES_RDP_2PREGNNT_2EDU_LVL_2EMPLYMNT_2SELECTED_2HH_NAME_3PERS_3SEX_3AGE_YRS_3DISABLE_3RES_RDP_3PREGNNT_3EDU_LVL_3EMPLYMNT_3SELECTED_3HH_NAME_4PERS_4SEX_4AGE_YRS_4DISABLE_4RES_RDP_4PREGNNT_4EDU_LVL_4EMPLYMNT_4SELECTED_4HH_NAME_5PERS_5AGE_YRS_5SEX_5DISABLE_5RES_RDP_5PREGNNT_5EDU_LVL_5EMPLYMNT_5SELECTED_5HH_NAME_6PERS_6SEX_6AGE_YRS_6DISABLE_6RES_RDP_6PREGNNT_6EDU_LVL_6EMPLYMNT_6SELECTED_6HH_NAME_7PERS_7SEX_7AGE_YRS_7DISABLE_7RES_RDP_7PREGNNT_7EDU_LVL_7EMPLYMNT_7SELECTED_7HH_NAME_8PERS_8SEX_8AGE_YRS_8DISABLE_8RES_RDP_8PREGNNT_8EDU_LVL_8EMPLYMNT_8SELECTED_8HH_NAME_9PERS_9SEX_9AGE_YRS_9DISABLE_9RES_RDP_9PREGNNT_9EDU_LVL_9EMPLYMNT_9SELECTED_9HH_NAME_10PERS_10AGE_YRS_10SEX_10DISABLE_10RES_RDP_10PREGNNT_10EDU_LVL_10EMPLYMNT_10SELECTED_10HH_NAME_11PERS_11AGE_YRS_11SEX_11DISABLE_11RES_RDP_11PREGNNT_11EDU_LVL_11EMPLYMNT_11SELECTED_11HH_NAME_12PERS_12AGE_YRS_12SEX_12DISABLE_12RES_RDP_12PREGNNT_12EDU_LVL_12EMPLYMNT_12SELECTED_12HH_NAME_13PERS_13AGE_YRS_13SEX_13DISABLE_13RES_RDP_13PREGNNT_13EDU_LVL_13EMPLYMNT_13SELECTED_13HH_NAME_14PERS_14AGE_YRS_14SEX_14DISABLE_14RES_RDP_14PREGNNT_14EDU_LVL_14EMPLYMNT_14SELECTED_14

This is how the variables are listed in the dataset, all the information before HH_NAME_1 is the same for all members of the household. The variables after HHID (household ID) are specific to each household member (depending on the ending 1, 2, 3...)

Super User
Posts: 9,687

Re: Creating multiple observations from one

You didn't post some DATA yet .

Code: Program

data have;
input INT_NAME HH_Area HH_ADDR REF_PT HHID HH_NAME_1 SEX_1 HH_NAME_2 SEX_2;
n+1;
cards;
1 1 1 1 1 11 0 12 1
1 1 1 1 1 22 0 22 1
1 1 1 1 2 11 0 12 1
1 1 1 1 2 22 0 22 1
1 1 1 1 2 33 0 32 1
;
run;
proc transpose data=have out=temp;
by INT_NAME HH_Area HH_ADDR REF_PT HHID n;
var HH_NAME_: SEX_: ;
run;
data temp;
set temp;
id=input(scan(_name_,-1,'_'),best32.);
name=prxchange('s/_\d+$//',1,strip(_name_));
run;
proc sort data=temp(drop=_name_) out=x ;
by INT_NAME HH_Area HH_ADDR REF_PT HHID id;
run;
data x;
set x;
by INT_NAME HH_Area HH_ADDR REF_PT HHID n notsorted;
group+first.n;
run;

proc transpose data=x out=want;
by INT_NAME HH_Area HH_ADDR REF_PT HHID group;
var col1;
id name;
run;
Ask a Question
Discussion stats
  • 6 replies
  • 272 views
  • 0 likes
  • 4 in conversation