BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
trich12
Calcite | Level 5

Hi all!  I have a dataset with county voting records for presidential elections.  For example,

County     State     Democrat_1992     Republican_1992          Democrat_1996     Republican_1996

Autauga     AL          4819                         8715                             5015                    9519

Baldwin     AL           12195                       26270                           12776                  29476

I want to transform this data so I can work with it more easily by having years as a column.  I actually will need to linearly interpolate the data between the years, so ideally I need:

County          State                Year               Democrat           Republican

Autauga         AL                    1992               4819                    8715

Autauga         AL                    1993                  .                         .

Autauga         AL                    1994                  .                         .

Autauga         AL                    1995                  .                         .

Autauga         AL                    1996               5015                    9519

Baldwin          AL                    1992              12195                  26270

Baldwin          AL                    1993                 .                          .

Baldwin          AL                    1994                 .                          .

Baldwin          AL                    1995                 .                          .

Baldwin          AL                    1996              12776                  29476

So that I can then linearly interpolate for the years with missing data.  I would really appreciate any help with this!! Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

Something like that below I reckon.  You have two "problems" as your data stands

  1. data is contained in the meta data.  Party and year.
  2. you need to create records for the missing years.

data results;
   input County $ State $ Democrat_1992 Republican_1992 Democrat_1996 Republican_1996;
   cards;
Autauga AL  4819   8715   5015   9519
Baldwin AL 12195  26270  12776  29476
;;;;
   run;
proc transpose data=results out=results2(rename=(col1=result));
   by state county;
   run;
data results2;
   set results2;
   length party $3;
   party = _name_;
   year = input(scan(_name_,-
1,'_'),4.);
   run;
proc sort data=results2;
   by state county party year;
   run;
proc summary data=results2;
   output out=years(drop=_:) min(year)=min max(year)=max;
  
run;
data classdata;
   set years;
   do year = min to max;
      output;
     
end;
  
run;
proc summary data=results2 classdata=classdata nway;
  
by state county party;
   class year;
   output out=filled(drop=_:) idgroup(out(result)=);
   run;
proc print;
  
run;
proc reg data=filled noprint;
  
by state county party;
   model result=year;
   output out=int p=interpolated;
   run;
  
quit;
proc print;
  
run;
proc sort data=int;
   by state county year party;
   run;
proc transpose data=int out=int2;
   by state county year;
   id party;
   var interpolated;
   run;
proc print;
  
run;

View solution in original post

3 REPLIES 3
Vince28_Statcan
Quartz | Level 8

step1

proc transpose by county/state

step2 data step

year=input(substr(_NAME_, -1, 4) best32,)); /* to convert Democrat_1992 into numeric year 1992 */

step 3 - retranspose to get democrat and republican columns

step 4 data step iteration to create the missing years if they're really needed for your interpolation calculation.

Vince

data_null__
Jade | Level 19

Something like that below I reckon.  You have two "problems" as your data stands

  1. data is contained in the meta data.  Party and year.
  2. you need to create records for the missing years.

data results;
   input County $ State $ Democrat_1992 Republican_1992 Democrat_1996 Republican_1996;
   cards;
Autauga AL  4819   8715   5015   9519
Baldwin AL 12195  26270  12776  29476
;;;;
   run;
proc transpose data=results out=results2(rename=(col1=result));
   by state county;
   run;
data results2;
   set results2;
   length party $3;
   party = _name_;
   year = input(scan(_name_,-
1,'_'),4.);
   run;
proc sort data=results2;
   by state county party year;
   run;
proc summary data=results2;
   output out=years(drop=_:) min(year)=min max(year)=max;
  
run;
data classdata;
   set years;
   do year = min to max;
      output;
     
end;
  
run;
proc summary data=results2 classdata=classdata nway;
  
by state county party;
   class year;
   output out=filled(drop=_:) idgroup(out(result)=);
   run;
proc print;
  
run;
proc reg data=filled noprint;
  
by state county party;
   model result=year;
   output out=int p=interpolated;
   run;
  
quit;
proc print;
  
run;
proc sort data=int;
   by state county year party;
   run;
proc transpose data=int out=int2;
   by state county year;
   id party;
   var interpolated;
   run;
proc print;
  
run;
trich12
Calcite | Level 5

That worked perfectly! Thank you so, so much!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 799 views
  • 3 likes
  • 3 in conversation