Solved: Re: Transforming variables and creating multiple observations from sin...

wernie · Posted 06-02-2017 03:12 PM

I have two datasets and need to manipulate some of the data. The first one is for an outcome I'm interested in and has different county IDs, age groups, gender, and cases. An example is as follows:

County_ID Age_Group Gender Cases

County1 1 F 15

County1 1 M 3

County1 3 F 2

County1 3 M 20

County1 4 F 7

County1 4 M 8

County2 1 F 13

County2 1 M 12

The second dataset contains population data by census tract (~70k rows) and I need to get the data so that I can have it arranged like above (each CT_ID repeated to cover all age groups and genders). It is currently organised as follows because of the original file:

CT_ID Geography M0_4 M5-9 M10_14 M15_19....through M85up (plus same age group variables for females(F0_4, etc.))

111111 CT1_AL 40 56 68 85

111112 CT2_AL 46 60 74 91

111113 CT3_AL 30 42 55 62

111114 CT4_AL 50 65 79 93

So I want to transform the population data (second example) to look like this so I can later these aggregate census tracts into the counties and then merge so the population would be an additional column in the outcome table (first example):

CT_ID Geography Age_Group Gender Pop

111111 CT1_AL 1 M 40

111111 CT1_AL 2 M 56

111111 CT1_AL 3 M 68

111111 CT1_AL 4 M 85

111112 CT2_AL 1 M 46

111112 CT2_AL 2 M 60

111112 CT1_AL 3 M 74

111112 CT1_AL 4 M 91

111113 CT3_AL 1 M 30

111113 CT3_AL 2 M 42

111113 CT3_AL 3 M 55

111113 CT3_AL 4 M 62

111114 CT4_AL 1 M 50

111114 CT4_AL 2 M 66

111114 CT4_AL 3 M 79

111114 CT4_AL 4 M 93

I'm not sure if this is something an array would be good for or if I need to make use of @ or @ @ (or something totally different). Any help is appreciated! Thanks 🙂

Patrick · Posted 06-02-2017 09:45 PM

@wernie

Here some code demonstrating how you could go about transforming your 2nd table.

data have;
  input CT_ID $ Geography $ M0_4 M5_9 M10_14 M15_19;
  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

data want;
  set have;

  array males {*} M:;
  gender='M';
  do age_group=1 to dim(males);
    pop=males[age_group];
    output;
  end;
  drop M:;

/*  array females {*} F:;*/
/*  gender='F';*/
/*  do age_group=1 to dim(females);*/
/*    pop=females[age_group];*/
/*    output;*/
/*  end;*/
/*  drop F:;*/

run;

I normally prefer to read external data as unchanged as possible into SAS because this allows me to better verify if data got read correctly. But you could also read the data directly into the structure you want by using code like below.

data sample(drop=_i);
  infile datalines dlm=' ' dsd truncover;
  input CT_ID $ Geography $ @;

  gender='M';
  do _i=1 to 4;
    input pop @;
    output;
  end;

  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

To later on merge your population data to your first table you need to have a common key in both tables so you need to somehow translate/map your Geography codes to County_IDs.

View solution in original post

Patrick · Posted 06-02-2017 09:45 PM

@wernie

Here some code demonstrating how you could go about transforming your 2nd table.

data have;
  input CT_ID $ Geography $ M0_4 M5_9 M10_14 M15_19;
  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

data want;
  set have;

  array males {*} M:;
  gender='M';
  do age_group=1 to dim(males);
    pop=males[age_group];
    output;
  end;
  drop M:;

/*  array females {*} F:;*/
/*  gender='F';*/
/*  do age_group=1 to dim(females);*/
/*    pop=females[age_group];*/
/*    output;*/
/*  end;*/
/*  drop F:;*/

run;

I normally prefer to read external data as unchanged as possible into SAS because this allows me to better verify if data got read correctly. But you could also read the data directly into the structure you want by using code like below.

data sample(drop=_i);
  infile datalines dlm=' ' dsd truncover;
  input CT_ID $ Geography $ @;

  gender='M';
  do _i=1 to 4;
    input pop @;
    output;
  end;

  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

To later on merge your population data to your first table you need to have a common key in both tables so you need to somehow translate/map your Geography codes to County_IDs.

wernie · Posted 06-05-2017 10:15 AM

Great, thank you! @Patrick I used the option with the arrays and that worked well. Can you briefly explain how what you posted worked to create the table in that way? Just trying to understand things better. Thanks again!

Patrick · Posted 06-05-2017 06:08 PM

@wernie

For males the code creates an array with all variables starting with 'M'. That's what the 'M:' syntax does.

This works of course only if there are no other variables in your code starting with 'M' and the variables are in the right order. This was the case in your sample data. Else: You would have to spell out the variables in the array definition.

The rest is a simple do loop over the array with a loop iteration per element in the array. Here the docu link if that's not clear to you.

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#p1i7zlvukj8arhn1ihiv...

wernie · Posted 06-06-2017 08:08 AM

Thank you!

Transforming variables and creating multiple observations from single obs

Re: Transforming variables and creating multiple observations from single obs

Re: Transforming variables and creating multiple observations from single obs

Re: Transforming variables and creating multiple observations from single obs

Re: Transforming variables and creating multiple observations from single obs

Re: Transforming variables and creating multiple observations from single obs

Transforming variables and creating multiple observations from single obs

Re: Transforming variables and creating multiple observations from single obs

Re: Transforming variables and creating multiple observations from single obs

Re: Transforming variables and creating multiple observations from single obs

Re: Transforming variables and creating multiple observations from single obs

Re: Transforming variables and creating multiple observations from single obs

SAS Innovate 2025: Call for Content