BookmarkSubscribeRSS Feed
jss539
Fluorite | Level 6

I have a large dataset in SAS that I need to change to wide from long.

Basically, the data is sorted with multiple observations for each property for about 10 different years with around 150 variables. I would like to make it so each variable is a different year and have one observation per property. Each variable is uniquely named and I want to keep the name in it.

 

What I have:

Property  Year  Var1 ...  Var150

Prop1       2010  100 ...  ABC

Prop1       2011  101 ...  DEF

.

.

Prop1       2017  138 ... XYZ

Prop2 ...

 

What I want:

Property Var1_2010 ... Var1_2017 ... Var150_2010 ... Var150_2017

Prop1           100        ...          138     ...         ABC         ...      XYZ

Prop2 ...

 

I tried using Proc Transpose, but I cannot find a way to concatenate the year at the end of each original variable name.

I also tried:

proc sort data=hotels;
by propertyID year;
run;

proc transpose data=hotels out=hotels_wide;
by propertyID year;
var _all_;
run;

proc transpose data=hotels_wide delimiter=_ out=hotels_wide_full(drop=_name_);
by propertyID;
var col1;
id _name_ year;
run;

But the last block gave me an error telling me there are multiple occurrences of each variable for each propertyID: ERROR: The ID value "RmsServiceChargeDistribution_2E3" occurs twice in the same BY group.

 

A shortcoming of this would also be losing the formats of each variable.

4 REPLIES 4
Cynthia_sas
SAS Super FREQ
Hi:
I don't see how this is going from WIDE to LONG. Right now, you have multiple rows for PROP1 and multiple rows for PROP2, one for each year. I would consider that already LONG. If you change it so that you have 1 value for Property and up to 150 separate variables for each year, that seems to me you are going from sort of wide, but mostly long to really, really wide and not long at all.

Or, to do the math based on your explanation, if you had a dataset with only 3 rows for Prop1 and 150 variables for each row then that's this:
PROP1 2010 with 150 variables
PROP1 2011 with 150 variables
PROP1 2012 with 150 variables

But, if you now collapse those 3 rows into just 1 row for PROP1, then you'll now have 1 row with 450 variables. That seems like a much wider dataset to me. What do you need to do with this data. Most SAS procedures would be much happier dealing with the data as you have it now, than in the structure you're proposing. What is the downstream process for this very, very wide table?

Cynthia
PaigeMiller
Diamond | Level 26

Converting this long data set to wide will give you a programming nightmare. There's no easy way to handle such a wide data set, other than hard coding all the variable names when you want to use them. You'd be much better off leaving the data as long, and then built in SAS methods, such as BY variables and PROC REPORT and many others, make handling such a long data set very easy. Why do you need this data set as a wide data set?

--
Paige Miller
Cynthia_sas
SAS Super FREQ

Hi:

  The formats are the least of your worries. When you do a transpose with VAR _ALL_; if you have a mix of numeric and character variables, all your numeric variables will be converted to character.

  As an example, consider this:

Cynthia_sas_1-1598055314541.png

 

 

I think the result after the second transpose is what you want, but the challenge is that I only had 2 properties and 3 years for each property and then 4 variables (VAR1-VAR4) for each row. With 150 variables on each row, this would quickly become unmanageable and all your numeric variables would now be character too. The program actually does 2 transposes. The first transpose gets you a variable for year and then _NAME_ holds the value of the original variable name. Next, with a DATA step, you make a variable called NEWNAME that can be used in a second transpose,

 

But still I think your original structure is more useful for most SAS procedures.

Cynthia

here's the program I used to make the fake data:

 data fakedata;
  infile datalines dlm=',';
  input propertyid $ year var1 var2 $ var3 var4 $;
return;
datalines;
Prop1,2010,101,ABC,102,DEF
Prop1,2011,201,GHI,202,JKL
Prop1,2012,301,MNO,302,PQR
Prop2,2010,401,KJI,402,NML
Prop2,2011,501,OPQ,502,TSR
Prop2,2012,601,WUV,602,ZYX
;
run;

 

 

Tom
Super User Tom
Super User

The code is fine.  But the example does not reflect the actual data.  The error messages makes if very clear what the issue is.

The original dataset has a variable with a name that is already waaaaaaay too long.

1234567890123456789012345678012
RmsServiceChargeDistribution

Since there is not room to add 5 more characters to the end of the existing name the year was reduced to 3 characters.  But that means that the new name is no longer unique.

Either use more reasonable lengths for the variable names to start with, or you will need to add a step to create unique names for the new variables.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1066 views
  • 0 likes
  • 4 in conversation