Unstacking columns

Bhupi · Posted 04-03-2014 11:18 AM

Hi, I have two different kind of datasets. First dataset look like-

YEAR	ACTUAL YIELD
2004	216
2004	212
2004	240
2004	210
2004	210
2004	215

and second dataset look like-

YEAR	Crop	ACTUAL YIELD
2004	Corn	216
2004	Corn	212
2004	Corn	240
2004	Soybean	210
2004	Soybean	210
2004	Soybean	215

I want to convert first and second dataset as given below.

CornY_2004	CornY_2005	CornY_2006	CornY_2007
216	212	240	210
210	215	212	215

CornY_2004	CornY_2005	SoYY_2004	SoYY_2005
216	212	216	212
210	215	210	215

Number of values in each year for each category are different.

Thanks for help,

Bhupinder

RW9 · Posted 04-03-2014 11:28 AM

Hi,

Concatenate the two variables, then transpose by them:

data temp (keep=tran_var yield);

set have;

attrib tran_var format=$20.;

tran_var=strip(crop)||"_"||strip(year);

run;

proc transpose data=temp out=want;

by tran_var;

var yield;

id tran_var;

idlabel tran_var;

run;

Note you may need to fiddle with it slightly and maybe add a sort as typed this quickly as leaving.

Bhupi · Posted 04-03-2014 11:51 AM

Hi,

I got the following error.

ERROR: The ID value "Corn_2004" occurs twice in the same BY group.

NOTE: The above message was for the following BY group:

tran_var=Corn_2004

ERROR: The ID value "Corn_2005" occurs twice in the same BY group.

ERROR: Too many bad BY groups.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 4628 observations read from the data set WORK.TEMP.

WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0

observations and 0 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

real time 1.25 seconds

My revised code is

Data temp (keep=tran_var Irr);

set sample;

attrib tran_var format=$20.;

tran_var=strip(PREVIOUSCROP)||"_"||strip(year);

run;

Proc Sort data=temp;

By tran_var; Run;

proc transpose data=temp out=want;

by tran_var;

var Irr;

id tran_var;

idlabel tran_var;

run;

Thanks,

Bhupinder

RW9 · Posted 04-03-2014 01:10 PM

Hi,

Yes, too quick at posting the response. I note in your test data that you have CROP 2004, three times, but in the sample out data you have crop 2004, crop 2005 etc. The transpose needs unique columns for this to work, so either your test data is wrong and should read:

2004	Corn	216
2005	Corn	212
2006	Corn	240

Or you need to add a unique value to it

Also I had used the tran_var as by group in the proc transpose. Do:

data temp (keep=tran_var yield);

set have;

attrib tran_var format=$20.;

tran_var=strip(crop)||"_"||strip(year);

/* if the years are all the same then maybe use _n_ to make unique

tran_var=strip(crop)||"_"||strip(year)||"_"||strip(put(i,best.)); */

run;

proc transpose data=temp out=want;

by crop;

var yield;

id tran_var;

idlabel tran_var;

run;

So the idea is to get to a stage where you have:

By Group Var Crop Value

Corn Corn_2004 xx

Corn Corn_2005 xx

etc.

Or if using n:

Corn Corn_2004_1 xx

Corn Corn_2004_2 xxx

Bhupi · Posted 04-03-2014 01:28 PM

Thanks for quick response. I believe I have to add create a unique identifier for each value which I don't have at the moment. Dataset look like

2004 Corn 33

2004 Corn 9.04

2004 Soy 32.5

2004 Soy 16.8

2004 Soy 16.9

2005 Corn 33

2005 Corn 9.04

2005 Soy 32.5

2005 Soy 16.8

2006 Corn 16.9

....

The outputs will have columns with labels as following:

2004_Corn 2004_Soybean 2005_corn 2005_Soybean 2006_corn 2006_Soybean

Each column will have multiple but most likely different number of rows as per the data points falling under each label.

Thanks, Bhupinder

RW9 · Posted 04-04-2014 04:34 AM

Hi,

Yes, so the logic for a transpose is to identify a grouping variable, a variable you want to transpose, and optionally a label for those columns. In the below code I create a label variable - idlab - which will be used to name and label the transposed columns. In the second step I have a small retain statement to assign a group variable based on wether the idlab is the first second, third etc. in the list, resetting when I get to a new idlab. The inter table is then transposed based on the bygroup.

data have;
infile datalines;
attrib idlab format=$20.;
input year $ crop $ yield;
idlab=strip(year)||"_"||strip(crop);
datalines;
2004       Corn      33
2004       Corn      9.04
2004       Soy         32.5
2004       Soy         16.8
2004       Soy         16.9
2005       Corn      33
2005       Corn      9.04
2005       Soy         32.5
2005       Soy         16.8
2006       Corn         16.9
;
run;

data inter (drop=year crop lstbygrp);
set have;
attrib bygroup format=8.;
retain lstbygrp bygroup;
if _n_=1 or lstbygrp ne idlab then do;
lstbygrp=idlab;
bygroup=1;
end;
else bygroup=bygroup+1;
run;

proc sort data=inter;
by bygroup idlab;
run;

proc transpose data=inter out=want;
by bygroup;
var yield;
id idlab;
idlabel idlab;
run;

Bhupi · Posted 04-04-2014 10:12 AM

Hi, thanks for helping me out. I am still not able to run the code successfully. Year is numeric variable so I believe the input should not have $ sign after year. Am I correct?

input year $ crop $ yield;

The error message is

ERROR: The ID value "_2011_Corn" occurs twice in the same BY group.

NOTE: The above message was for the following BY group:

bygroup=9

ERROR: The ID value "_2004_BEANS" occurs twice in the same BY group.

ERROR: Too many bad BY groups.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 7626 observations read from the data set WORK.INTER.

WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0

observations and 0 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

real time 1.73 seconds

cpu time 1.72 seconds

RW9 · Posted 04-04-2014 10:26 AM

It shouldn't matter, if the year is character than it will appear as 2014_xxx, if you put it as numeric SAS will add an underscore _2014_xx as first character has to be not number. The error above is telling you the what you are grouping by is not distinct. You need to examine your full table, identify which are the groups to transpose by. I have added code to the inter datastep to assign a unique number to the groups I could see, however you will need to look at your data and change for your purposes. So my table prior to transposing looks like this:

IDLAB YIELD BYGROUP

2004_Corn 33 1

2004_Soy 32.5 1

2005_Corn 33 1

2005_Soy 32.5 1

2006_Corn 16.9 1

2004_Corn 9.04 2

2004_Soy 16.8 2

...

As you can see the 2004_Corn appears only once in Group 1, and Once in Group 2. If it appears more than once then you get the above error. So identify your groupings and make sure they are distinct.

Unstacking columns

Re: Unstacking columns

Re: Unstacking columns

Re: Unstacking columns

Re: Unstacking columns

Re: Unstacking columns

Re: Unstacking columns

Re: Unstacking columns

Registration is open