BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wernie
Quartz | Level 8

I have two datasets and need to manipulate some of the data. The first one is for an outcome I'm interested in and has different county IDs, age groups, gender, and cases. An example is as follows:

 

County_ID                        Age_Group      Gender      Cases

County1                                 1                     F               15

County1                                 1                    M                3

County1                                 3                    F                 2

County1                                 3                    M               20

County1                                 4                    F                 7

County1                                 4                    M                8

County2                                 1                    F                13

County2                                 1                    M               12

 

The second dataset contains population data by census tract (~70k rows) and I need to get the data so that I can have it arranged like above (each CT_ID repeated to cover all age groups and genders). It is currently organised as follows because of the original file:

 

CT_ID      Geography      M0_4    M5-9   M10_14  M15_19....through M85up (plus same age group variables for females(F0_4, etc.))

111111      CT1_AL           40          56          68           85

111112      CT2_AL           46          60          74           91

111113      CT3_AL           30          42          55           62

111114      CT4_AL           50          65          79           93

 

So I want to transform the population data (second example) to look like this so I can later these aggregate census tracts into the counties and then merge so the population would be an additional column in the outcome table (first example):

 

CT_ID      Geography  Age_Group  Gender  Pop

111111      CT1_AL             1               M        40        

111111      CT1_AL             2               M        56

111111      CT1_AL             3               M        68

111111      CT1_AL             4               M        85

111112      CT2_AL             1               M        46

111112      CT2_AL             2               M        60        

111112      CT1_AL             3               M        74

111112      CT1_AL             4               M        91

111113      CT3_AL             1               M        30

111113      CT3_AL             2               M        42        

111113      CT3_AL             3               M        55

111113      CT3_AL             4               M        62

111114      CT4_AL             1               M        50

111114      CT4_AL             2               M        66        

111114      CT4_AL             3               M        79

111114      CT4_AL             4               M        93

 

I'm not sure if this is something an array would be good for or if I need to make use of @ or @ @  (or something totally different). Any help is appreciated! Thanks 🙂

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@wernie

Here some code demonstrating how you could go about transforming your 2nd table.

data have;
  input CT_ID $ Geography $ M0_4 M5_9 M10_14 M15_19;
  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

data want;
  set have;

  array males {*} M:;
  gender='M';
  do age_group=1 to dim(males);
    pop=males[age_group];
    output;
  end;
  drop M:;

/*  array females {*} F:;*/
/*  gender='F';*/
/*  do age_group=1 to dim(females);*/
/*    pop=females[age_group];*/
/*    output;*/
/*  end;*/
/*  drop F:;*/

run;

 

I normally prefer to read external data as unchanged as possible into SAS because this allows me to better verify if data got read correctly. But you could also read the data directly into the structure you want by using code like below.

data sample(drop=_i);
  infile datalines dlm=' ' dsd truncover;
  input CT_ID $ Geography $ @;

  gender='M';
  do _i=1 to 4;
    input pop @;
    output;
  end;

  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

To later on merge your population data to your first table you need to have a common key in both tables so you need to somehow translate/map your Geography codes to County_IDs.

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

@wernie

Here some code demonstrating how you could go about transforming your 2nd table.

data have;
  input CT_ID $ Geography $ M0_4 M5_9 M10_14 M15_19;
  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

data want;
  set have;

  array males {*} M:;
  gender='M';
  do age_group=1 to dim(males);
    pop=males[age_group];
    output;
  end;
  drop M:;

/*  array females {*} F:;*/
/*  gender='F';*/
/*  do age_group=1 to dim(females);*/
/*    pop=females[age_group];*/
/*    output;*/
/*  end;*/
/*  drop F:;*/

run;

 

I normally prefer to read external data as unchanged as possible into SAS because this allows me to better verify if data got read correctly. But you could also read the data directly into the structure you want by using code like below.

data sample(drop=_i);
  infile datalines dlm=' ' dsd truncover;
  input CT_ID $ Geography $ @;

  gender='M';
  do _i=1 to 4;
    input pop @;
    output;
  end;

  datalines;
111111 CT1_AL 40 56 68 85
111112 CT2_AL 46 60 74 91
111113 CT3_AL 30 42 55 62
111114 CT4_AL 50 65 79 93
;
run;

To later on merge your population data to your first table you need to have a common key in both tables so you need to somehow translate/map your Geography codes to County_IDs.

wernie
Quartz | Level 8

Great, thank you! @Patrick I used the option with the arrays and that worked well. Can you briefly explain how what you posted worked to create the table in that way? Just trying to understand things better. Thanks again! Smiley Happy

Patrick
Opal | Level 21

@wernie

For males the code creates an array with all variables starting with 'M'. That's what the 'M:' syntax does. 

This works of course only if there are no other variables in your code starting with 'M' and the variables are in the right order. This was the case in your sample data. Else: You would have to spell out the variables in the array definition.

 

The rest is a simple do loop over the array with a loop iteration per element in the array. Here the docu link if that's not clear to you.

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#p1i7zlvukj8arhn1ihiv...

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 774 views
  • 1 like
  • 2 in conversation