Hi, I have two different kind of datasets. First dataset look like-
YEAR | ACTUAL YIELD |
2004 | 216 |
2004 | 212 |
2004 | 240 |
2004 | 210 |
2004 | 210 |
2004 | 215 |
and second dataset look like-
YEAR | Crop | ACTUAL YIELD |
2004 | Corn | 216 |
2004 | Corn | 212 |
2004 | Corn | 240 |
2004 | Soybean | 210 |
2004 | Soybean | 210 |
2004 | Soybean | 215 |
I want to convert first and second dataset as given below.
CornY_2004 | CornY_2005 | CornY_2006 | CornY_2007 |
216 | 212 | 240 | 210 |
210 | 215 | 212 | 215 |
CornY_2004 | CornY_2005 | SoYY_2004 | SoYY_2005 |
216 | 212 | 216 | 212 |
210 | 215 | 210 | 215 |
Number of values in each year for each category are different.
Thanks for help,
Bhupinder
Hi,
Concatenate the two variables, then transpose by them:
data temp (keep=tran_var yield);
set have;
attrib tran_var format=$20.;
tran_var=strip(crop)||"_"||strip(year);
run;
proc transpose data=temp out=want;
by tran_var;
var yield;
id tran_var;
idlabel tran_var;
run;
Note you may need to fiddle with it slightly and maybe add a sort as typed this quickly as leaving.
Hi,
I got the following error.
ERROR: The ID value "Corn_2004" occurs twice in the same BY group.
ERROR: The ID value "Corn_2004" occurs twice in the same BY group.
ERROR: The ID value "Corn_2004" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
tran_var=Corn_2004
ERROR: The ID value "Corn_2005" occurs twice in the same BY group.
ERROR: The ID value "Corn_2005" occurs twice in the same BY group.
ERROR: Too many bad BY groups.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 4628 observations read from the data set WORK.TEMP.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0
observations and 0 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 1.25 seconds
My revised code is
Data temp (keep=tran_var Irr);
set sample;
attrib tran_var format=$20.;
tran_var=strip(PREVIOUSCROP)||"_"||strip(year);
run;
Proc Sort data=temp;
By tran_var; Run;
proc transpose data=temp out=want;
by tran_var;
var Irr;
id tran_var;
idlabel tran_var;
run;
Thanks,
Bhupinder
Hi,
Yes, too quick at posting the response. I note in your test data that you have CROP 2004, three times, but in the sample out data you have crop 2004, crop 2005 etc. The transpose needs unique columns for this to work, so either your test data is wrong and should read:
2004 | Corn | 216 |
2005 | Corn | 212 |
2006 | Corn | 240 |
Or you need to add a unique value to it
Also I had used the tran_var as by group in the proc transpose. Do:
data temp (keep=tran_var yield);
set have;
attrib tran_var format=$20.;
tran_var=strip(crop)||"_"||strip(year);
/* if the years are all the same then maybe use _n_ to make unique
tran_var=strip(crop)||"_"||strip(year)||"_"||strip(put(i,best.)); */
run;
proc transpose data=temp out=want;
by crop;
var yield;
id tran_var;
idlabel tran_var;
run;
So the idea is to get to a stage where you have:
By Group Var Crop Value
Corn Corn_2004 xx
Corn Corn_2005 xx
etc.
Or if using n:
Corn Corn_2004_1 xx
Corn Corn_2004_2 xxx
Thanks for quick response. I believe I have to add create a unique identifier for each value which I don't have at the moment. Dataset look like
2004 Corn 33
2004 Corn 9.04
2004 Soy 32.5
2004 Soy 16.8
2004 Soy 16.9
2005 Corn 33
2005 Corn 9.04
2005 Soy 32.5
2005 Soy 16.8
2006 Corn 16.9
....
....
....
The outputs will have columns with labels as following:
2004_Corn 2004_Soybean 2005_corn 2005_Soybean 2006_corn 2006_Soybean
Each column will have multiple but most likely different number of rows as per the data points falling under each label.
Thanks, Bhupinder
Hi,
Yes, so the logic for a transpose is to identify a grouping variable, a variable you want to transpose, and optionally a label for those columns. In the below code I create a label variable - idlab - which will be used to name and label the transposed columns. In the second step I have a small retain statement to assign a group variable based on wether the idlab is the first second, third etc. in the list, resetting when I get to a new idlab. The inter table is then transposed based on the bygroup.
data have;
infile datalines;
attrib idlab format=$20.;
input year $ crop $ yield;
idlab=strip(year)||"_"||strip(crop);
datalines;
2004 Corn 33
2004 Corn 9.04
2004 Soy 32.5
2004 Soy 16.8
2004 Soy 16.9
2005 Corn 33
2005 Corn 9.04
2005 Soy 32.5
2005 Soy 16.8
2006 Corn 16.9
;
run;
data inter (drop=year crop lstbygrp);
set have;
attrib bygroup format=8.;
retain lstbygrp bygroup;
if _n_=1 or lstbygrp ne idlab then do;
lstbygrp=idlab;
bygroup=1;
end;
else bygroup=bygroup+1;
run;
proc sort data=inter;
by bygroup idlab;
run;
proc transpose data=inter out=want;
by bygroup;
var yield;
id idlab;
idlabel idlab;
run;
Hi, thanks for helping me out. I am still not able to run the code successfully. Year is numeric variable so I believe the input should not have $ sign after year. Am I correct?
input year $ crop $ yield;
The error message is
ERROR: The ID value "_2011_Corn" occurs twice in the same BY group.
ERROR: The ID value "_2011_Corn" occurs twice in the same BY group.
ERROR: The ID value "_2011_Corn" occurs twice in the same BY group.
ERROR: The ID value "_2011_Corn" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
bygroup=9
ERROR: The ID value "_2004_BEANS" occurs twice in the same BY group.
ERROR: The ID value "_2004_BEANS" occurs twice in the same BY group.
ERROR: Too many bad BY groups.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 7626 observations read from the data set WORK.INTER.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0
observations and 0 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 1.73 seconds
cpu time 1.72 seconds
It shouldn't matter, if the year is character than it will appear as 2014_xxx, if you put it as numeric SAS will add an underscore _2014_xx as first character has to be not number. The error above is telling you the what you are grouping by is not distinct. You need to examine your full table, identify which are the groups to transpose by. I have added code to the inter datastep to assign a unique number to the groups I could see, however you will need to look at your data and change for your purposes. So my table prior to transposing looks like this:
IDLAB YIELD BYGROUP
2004_Corn 33 1
2004_Soy 32.5 1
2005_Corn 33 1
2005_Soy 32.5 1
2006_Corn 16.9 1
2004_Corn 9.04 2
2004_Soy 16.8 2
...
As you can see the 2004_Corn appears only once in Group 1, and Once in Group 2. If it appears more than once then you get the above error. So identify your groupings and make sure they are distinct.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.