Statistical programming, matrix languages, and more

Unstacking columns

Reply
Occasional Contributor
Posts: 13

Unstacking columns


Hi, I have two different kind of datasets. First dataset look like-

YEARACTUAL YIELD
2004216
2004212
2004240
2004210
2004210
2004215

and second dataset look like-

YEARCropACTUAL YIELD
2004Corn216
2004Corn212
2004Corn240
2004Soybean210
2004Soybean210
2004Soybean215

I want to convert first and second dataset as given below.


CornY_2004CornY_2005CornY_2006CornY_2007
216212240210
210215212215

CornY_2004CornY_2005SoYY_2004SoYY_2005
216212216212
210215210215

Number of values in each year for each category are different.


Thanks for help,

Bhupinder

Esteemed Advisor
Esteemed Advisor
Posts: 6,693

Re: Unstacking columns

Hi,

Concatenate the two variables, then transpose by them:

data temp (keep=tran_var yield);

     set have;

     attrib tran_var format=$20.;

     tran_var=strip(crop)||"_"||strip(year);

run;

proc transpose data=temp out=want;

     by tran_var;

     var yield;

     id tran_var;

     idlabel tran_var;

run;

Note you may need to fiddle with it slightly and maybe add a sort as typed this quickly as leaving.

Occasional Contributor
Posts: 13

Re: Unstacking columns

Hi,

I got the following error.

ERROR: The ID value "Corn_2004" occurs twice in the same BY group.

ERROR: The ID value "Corn_2004" occurs twice in the same BY group.

ERROR: The ID value "Corn_2004" occurs twice in the same BY group.

NOTE: The above message was for the following BY group:

      tran_var=Corn_2004

ERROR: The ID value "Corn_2005" occurs twice in the same BY group.

ERROR: The ID value "Corn_2005" occurs twice in the same BY group.

ERROR: Too many bad BY groups.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 4628 observations read from the data set WORK.TEMP.

WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0

         observations and 0 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           1.25 seconds

My revised code is

Data temp (keep=tran_var Irr);

     set sample;

     attrib tran_var format=$20.;

     tran_var=strip(PREVIOUSCROP)||"_"||strip(year);

run;

Proc Sort data=temp;

By tran_var; Run;

proc transpose data=temp out=want;

     by tran_var;

     var Irr;

     id tran_var;

     idlabel tran_var;

run;

Thanks,

Bhupinder

Esteemed Advisor
Esteemed Advisor
Posts: 6,693

Re: Unstacking columns

Hi,

Yes, too quick at posting the response. I note in your test data that you have CROP 2004, three times, but in the sample out data you have crop 2004, crop 2005 etc.  The transpose needs unique columns for this to work, so either your test data is wrong and should read:

2004Corn216
2005Corn212
2006Corn240

Or you need to add a unique value to it

Also I had used the tran_var as by group in the proc transpose.  Do:

data temp (keep=tran_var yield);

     set have;

     attrib tran_var format=$20.;

     tran_var=strip(crop)||"_"||strip(year);

/* if the years are all the same then maybe use _n_ to make unique

     tran_var=strip(crop)||"_"||strip(year)||"_"||strip(put(i,best.)); */

run;

proc transpose data=temp out=want;

     by crop;

     var yield;

     id tran_var;

     idlabel tran_var;

run;

So the idea is to get to a stage where you have:

By Group Var          Crop              Value

Corn                       Corn_2004     xx

Corn                       Corn_2005     xx

etc.

Or if using n:

Corn                     Corn_2004_1          xx

Corn                     Corn_2004_2          xxx

Occasional Contributor
Posts: 13

Re: Unstacking columns

Thanks for quick response. I believe I have to add create a unique identifier for each value which I don't have at the moment. Dataset look like

2004       Corn      33

2004       Corn      9.04

2004       Soy         32.5

2004       Soy         16.8

2004       Soy         16.9

2005       Corn      33

2005       Corn      9.04

2005       Soy         32.5

2005       Soy         16.8

2006       Corn         16.9

....

....

....

The outputs will have columns with labels as following:

2004_Corn 2004_Soybean 2005_corn 2005_Soybean 2006_corn 2006_Soybean

Each column will have multiple but most likely different number of rows as per the data points falling under each label.

Thanks, Bhupinder

Esteemed Advisor
Esteemed Advisor
Posts: 6,693

Re: Unstacking columns

Hi,

Yes, so the logic for a transpose is to identify a grouping variable, a variable you want to transpose, and optionally a label for those columns.  In the below code I create a label variable - idlab - which will be used to name and label the transposed columns.  In the second step I have a small retain statement to assign a group variable based on wether the idlab is the first second, third etc. in the list, resetting when I get to a new idlab.  The inter table is then transposed based on the bygroup.

data have;
  infile datalines;
  attrib idlab format=$20.;
  input year $ crop $ yield;
  idlab=strip(year)||"_"||strip(crop);
datalines;
2004       Corn      33
2004       Corn      9.04
2004       Soy         32.5
2004       Soy         16.8
2004       Soy         16.9
2005       Corn      33
2005       Corn      9.04
2005       Soy         32.5
2005       Soy         16.8
2006       Corn         16.9
;
run;

data inter (drop=year crop lstbygrp);
  set have;
  attrib bygroup format=8.;
  retain lstbygrp bygroup;
  if _n_=1 or lstbygrp ne idlab then do;
    lstbygrp=idlab;
    bygroup=1;
  end;
  else bygroup=bygroup+1;
run;

proc sort data=inter;
  by bygroup idlab;
run;

proc transpose data=inter out=want;
  by bygroup;
  var yield;
  id idlab;
  idlabel idlab;
run;

Occasional Contributor
Posts: 13

Re: Unstacking columns

Hi, thanks for helping me out. I am still not able to run the code successfully. Year is numeric variable so I believe the input should not have $ sign after year. Am I correct?

  input year $ crop $ yield;

The error message is

ERROR: The ID value "_2011_Corn" occurs twice in the same BY group.

ERROR: The ID value "_2011_Corn" occurs twice in the same BY group.

ERROR: The ID value "_2011_Corn" occurs twice in the same BY group.

ERROR: The ID value "_2011_Corn" occurs twice in the same BY group.

NOTE: The above message was for the following BY group:

      bygroup=9

ERROR: The ID value "_2004_BEANS" occurs twice in the same BY group.

ERROR: The ID value "_2004_BEANS" occurs twice in the same BY group.

ERROR: Too many bad BY groups.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 7626 observations read from the data set WORK.INTER.

WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0

         observations and 0 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           1.73 seconds

      cpu time            1.72 seconds

Esteemed Advisor
Esteemed Advisor
Posts: 6,693

Re: Unstacking columns

It shouldn't matter, if the year is character than it will appear as 2014_xxx, if you put it as numeric SAS will add an underscore _2014_xx as first character has to be not number.  The error above is telling you the what you are grouping by is not distinct.  You need to examine your full table, identify which are the groups to transpose by.  I have added code to the inter datastep to assign a unique number to the groups I could see, however you will need to look at your data and change for your purposes.  So my table prior to transposing looks like this:

IDLAB               YIELD               BYGROUP

2004_Corn          33                    1

2004_Soy            32.5                 1

2005_Corn          33                     1

2005_Soy            32.5                 1

2006_Corn          16.9                   1

2004_Corn          9.04                    2

2004_Soy            16.8                   2

...

As you can see the 2004_Corn appears only once in Group 1, and Once in Group 2.  If it appears more than once then you get the above error.  So identify your groupings and make sure they are distinct.

Post a Question
Discussion Stats
  • 7 replies
  • 872 views
  • 3 likes
  • 2 in conversation