SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Transforming variables and creating multiple observations from single obs

Accepted Solution Solved
Reply
Contributor
Posts: 34
Accepted Solution

Transforming variables and creating multiple observations from single obs

[ Edited ]

I have two datasets and need to manipulate some of the data. The first one is for an outcome I'm interested in and has different county IDs, age groups, gender, and cases. An example is as follows:

 

County_ID                        Age_Group      Gender      Cases

County1                                 1                     F               15

County1                                 1                    M                3

County1                                 3                    F                 2

County1                                 3                    M               20

County1                                 4                    F                 7

County1                                 4                    M                8

County2                                 1                    F                13

County2                                 1                    M               12

 

The second dataset contains population data by census tract (~70k rows) and I need to get the data so that I can have it arranged like above (each CT_ID repeated to cover all age groups and genders). It is currently organised as follows because of the original file:

 

CT_ID      Geography      M0_4    M5-9   M10_14  M15_19....through M85up (plus same age group variables for females(F0_4, etc.))

111111      CT1_AL           40          56          68           85

111112      CT2_AL           46          60          74           91

111113      CT3_AL           30          42          55           62

111114      CT4_AL           50          65          79           93

 

So I want to transform the population data (second example) to look like this so I can later these aggregate census tracts into the counties and then merge so the population would be an additional column in the outcome table (first example):

 

CT_ID      Geography  Age_Group  Gender  Pop

111111      CT1_AL             1               M        40        

111111      CT1_AL             2               M        56

111111      CT1_AL             3               M        68

111111      CT1_AL             4               M        85

111112      CT2_AL             1               M        46

111112      CT2_AL             2               M        60        

111112      CT1_AL             3               M        74

111112      CT1_AL             4               M        91

111113      CT3_AL             1               M        30

111113      CT3_AL             2               M        42        

111113      CT3_AL             3               M        55

111113      CT3_AL             4               M        62

111114      CT4_AL             1               M        50

111114      CT4_AL             2               M        66        

111114      CT4_AL             3               M        79

111114      CT4_AL             4               M        93

 

I'm not sure if this is something an array would be good for or if I need to make use of @ or @ @  (or something totally different). Any help is appreciated! Thanks Smiley Happy


Accepted Solutions
Solution
‎06-05-2017 10:14 AM
Respected Advisor
Posts: 3,895

Re: Transforming variables and creating multiple observations from single obs

[ Edited ]

@wernie

Here some code demonstrating how you could go about transforming your 2nd table.

data have;
  input CT_ID $ Geography $ M0_4 M5_9 M10_14 M15_19;
  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

data want;
  set have;

  array males {*} M:;
  gender='M';
  do age_group=1 to dim(males);
    pop=males[age_group];
    output;
  end;
  drop M:;

/*  array females {*} F:;*/
/*  gender='F';*/
/*  do age_group=1 to dim(females);*/
/*    pop=females[age_group];*/
/*    output;*/
/*  end;*/
/*  drop F:;*/

run;

 

I normally prefer to read external data as unchanged as possible into SAS because this allows me to better verify if data got read correctly. But you could also read the data directly into the structure you want by using code like below.

data sample(drop=_i);
  infile datalines dlm=' ' dsd truncover;
  input CT_ID $ Geography $ @;

  gender='M';
  do _i=1 to 4;
    input pop @;
    output;
  end;

  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

To later on merge your population data to your first table you need to have a common key in both tables so you need to somehow translate/map your Geography codes to County_IDs.

View solution in original post


All Replies
Solution
‎06-05-2017 10:14 AM
Respected Advisor
Posts: 3,895

Re: Transforming variables and creating multiple observations from single obs

[ Edited ]

@wernie

Here some code demonstrating how you could go about transforming your 2nd table.

data have;
  input CT_ID $ Geography $ M0_4 M5_9 M10_14 M15_19;
  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

data want;
  set have;

  array males {*} M:;
  gender='M';
  do age_group=1 to dim(males);
    pop=males[age_group];
    output;
  end;
  drop M:;

/*  array females {*} F:;*/
/*  gender='F';*/
/*  do age_group=1 to dim(females);*/
/*    pop=females[age_group];*/
/*    output;*/
/*  end;*/
/*  drop F:;*/

run;

 

I normally prefer to read external data as unchanged as possible into SAS because this allows me to better verify if data got read correctly. But you could also read the data directly into the structure you want by using code like below.

data sample(drop=_i);
  infile datalines dlm=' ' dsd truncover;
  input CT_ID $ Geography $ @;

  gender='M';
  do _i=1 to 4;
    input pop @;
    output;
  end;

  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

To later on merge your population data to your first table you need to have a common key in both tables so you need to somehow translate/map your Geography codes to County_IDs.

Contributor
Posts: 34

Re: Transforming variables and creating multiple observations from single obs

Great, thank you! @Patrick I used the option with the arrays and that worked well. Can you briefly explain how what you posted worked to create the table in that way? Just trying to understand things better. Thanks again! Smiley Happy

Respected Advisor
Posts: 3,895

Re: Transforming variables and creating multiple observations from single obs

@wernie

For males the code creates an array with all variables starting with 'M'. That's what the 'M:' syntax does. 

This works of course only if there are no other variables in your code starting with 'M' and the variables are in the right order. This was the case in your sample data. Else: You would have to spell out the variables in the array definition.

 

The rest is a simple do loop over the array with a loop iteration per element in the array. Here the docu link if that's not clear to you.

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#p1i7zlvukj8arhn1ihiv...

 

 

Contributor
Posts: 34

Re: Transforming variables and creating multiple observations from single obs

Thank you!
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 161 views
  • 1 like
  • 2 in conversation