I have a large dataset in SAS that I need to change to wide from long.
Basically, the data is sorted with multiple observations for each property for about 10 different years with around 150 variables. I would like to make it so each variable is a different year and have one observation per property. Each variable is uniquely named and I want to keep the name in it.
What I have:
Property Year Var1 ... Var150
Prop1 2010 100 ... ABC
Prop1 2011 101 ... DEF
.
.
Prop1 2017 138 ... XYZ
Prop2 ...
What I want:
Property Var1_2010 ... Var1_2017 ... Var150_2010 ... Var150_2017
Prop1 100 ... 138 ... ABC ... XYZ
Prop2 ...
I tried using Proc Transpose, but I cannot find a way to concatenate the year at the end of each original variable name.
I also tried:
proc sort data=hotels;
by propertyID year;
run;
proc transpose data=hotels out=hotels_wide;
by propertyID year;
var _all_;
run;
proc transpose data=hotels_wide delimiter=_ out=hotels_wide_full(drop=_name_);
by propertyID;
var col1;
id _name_ year;
run;
But the last block gave me an error telling me there are multiple occurrences of each variable for each propertyID: ERROR: The ID value "RmsServiceChargeDistribution_2E3" occurs twice in the same BY group.
A shortcoming of this would also be losing the formats of each variable.
Converting this long data set to wide will give you a programming nightmare. There's no easy way to handle such a wide data set, other than hard coding all the variable names when you want to use them. You'd be much better off leaving the data as long, and then built in SAS methods, such as BY variables and PROC REPORT and many others, make handling such a long data set very easy. Why do you need this data set as a wide data set?
Hi:
The formats are the least of your worries. When you do a transpose with VAR _ALL_; if you have a mix of numeric and character variables, all your numeric variables will be converted to character.
As an example, consider this:
I think the result after the second transpose is what you want, but the challenge is that I only had 2 properties and 3 years for each property and then 4 variables (VAR1-VAR4) for each row. With 150 variables on each row, this would quickly become unmanageable and all your numeric variables would now be character too. The program actually does 2 transposes. The first transpose gets you a variable for year and then _NAME_ holds the value of the original variable name. Next, with a DATA step, you make a variable called NEWNAME that can be used in a second transpose,
But still I think your original structure is more useful for most SAS procedures.
Cynthia
here's the program I used to make the fake data:
data fakedata;
infile datalines dlm=',';
input propertyid $ year var1 var2 $ var3 var4 $;
return;
datalines;
Prop1,2010,101,ABC,102,DEF
Prop1,2011,201,GHI,202,JKL
Prop1,2012,301,MNO,302,PQR
Prop2,2010,401,KJI,402,NML
Prop2,2011,501,OPQ,502,TSR
Prop2,2012,601,WUV,602,ZYX
;
run;
The code is fine. But the example does not reflect the actual data. The error messages makes if very clear what the issue is.
The original dataset has a variable with a name that is already waaaaaaay too long.
1234567890123456789012345678012 RmsServiceChargeDistribution
Since there is not room to add 5 more characters to the end of the existing name the year was reduced to 3 characters. But that means that the new name is no longer unique.
Either use more reasonable lengths for the variable names to start with, or you will need to add a step to create unique names for the new variables.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.