Solved
Contributor
Posts: 23

# Data transformation

Hi all!  I have a dataset with county voting records for presidential elections.  For example,

County     State     Democrat_1992     Republican_1992          Democrat_1996     Republican_1996

Autauga     AL          4819                         8715                             5015                    9519

Baldwin     AL           12195                       26270                           12776                  29476

I want to transform this data so I can work with it more easily by having years as a column.  I actually will need to linearly interpolate the data between the years, so ideally I need:

County          State                Year               Democrat           Republican

Autauga         AL                    1992               4819                    8715

Autauga         AL                    1993                  .                         .

Autauga         AL                    1994                  .                         .

Autauga         AL                    1995                  .                         .

Autauga         AL                    1996               5015                    9519

Baldwin          AL                    1992              12195                  26270

Baldwin          AL                    1993                 .                          .

Baldwin          AL                    1994                 .                          .

Baldwin          AL                    1995                 .                          .

Baldwin          AL                    1996              12776                  29476

So that I can then linearly interpolate for the years with missing data.  I would really appreciate any help with this!! Thank you!

Accepted Solutions
Solution
‎10-04-2013 12:57 PM
Posts: 3,852

## Re: Data transformation

Something like that below I reckon.  You have two "problems" as your data stands

1. data is contained in the meta data.  Party and year.
2. you need to create records for the missing years.

data results;
input County \$ State \$ Democrat_1992 Republican_1992 Democrat_1996 Republican_1996;
cards;
Autauga AL  4819   8715   5015   9519
Baldwin AL 12195  26270  12776  29476
;;;;
run;
proc transpose data=results out=results2(rename=(col1=result));
by state county;
run;
data results2;
set results2;
length party \$3;
party = _name_;
year = input(scan(_name_,-
1,'_'),4.);
run;
proc sort data=results2;
by state county party year;
run;
proc summary data=results2;
output out=years(drop=_ min(year)=min max(year)=max;

run;
data classdata;
set years;
do year = min to max;
output;

end;

run;
proc summary data=results2 classdata=classdata nway;

by state county party;
class year;
output out=filled(drop=_ idgroup(out(result)=);
run;
proc print;

run;
proc reg data=filled noprint;

by state county party;
model result=year;
output out=int p=interpolated;
run;

quit;
proc print;

run;
proc sort data=int;
by state county year party;
run;
proc transpose data=int out=int2;
by state county year;
id party;
var interpolated;
run;
proc print;

run;

All Replies
Super Contributor
Posts: 339

## Re: Data transformation

step1

proc transpose by county/state

step2 data step

year=input(substr(_NAME_, -1, 4) best32,)); /* to convert Democrat_1992 into numeric year 1992 */

step 3 - retranspose to get democrat and republican columns

step 4 data step iteration to create the missing years if they're really needed for your interpolation calculation.

Vince

Solution
‎10-04-2013 12:57 PM
Posts: 3,852

## Re: Data transformation

Something like that below I reckon.  You have two "problems" as your data stands

1. data is contained in the meta data.  Party and year.
2. you need to create records for the missing years.

data results;
input County \$ State \$ Democrat_1992 Republican_1992 Democrat_1996 Republican_1996;
cards;
Autauga AL  4819   8715   5015   9519
Baldwin AL 12195  26270  12776  29476
;;;;
run;
proc transpose data=results out=results2(rename=(col1=result));
by state county;
run;
data results2;
set results2;
length party \$3;
party = _name_;
year = input(scan(_name_,-
1,'_'),4.);
run;
proc sort data=results2;
by state county party year;
run;
proc summary data=results2;
output out=years(drop=_ min(year)=min max(year)=max;

run;
data classdata;
set years;
do year = min to max;
output;

end;

run;
proc summary data=results2 classdata=classdata nway;

by state county party;
class year;
output out=filled(drop=_ idgroup(out(result)=);
run;
proc print;

run;
proc reg data=filled noprint;

by state county party;
model result=year;
output out=int p=interpolated;
run;

quit;
proc print;

run;
proc sort data=int;
by state county year party;
run;
proc transpose data=int out=int2;
by state county year;
id party;
var interpolated;
run;
proc print;

run;
Contributor
Posts: 23

## Re: Data transformation

That worked perfectly! Thank you so, so much!

🔒 This topic is solved and locked.

Discussion stats
• 3 replies
• 195 views
• 3 likes
• 3 in conversation