Hi,
I have a dataset with 169 variables and multiple observations per subject. The dataset is in the long format, but I need to transpose it to the wide format. Each subject has a different number of observations. I am familiar with arrays and do loops, and proc transpose, but I have never worked with a dataset this large. Any suggestions?
I am unable to post a snapshot of the data as it is medical data and would violate privacy rules.
Thanks so much!
Make a simple example with just one or two subjects and a few variables to reflect what you have.
Are the variables all numeric (or all character) or mixed?
If you are going from tall to wide do you have a variable that can be used to generate the name of the variable to be created in the wide dataset?
Sometimes people actually have a intermediate structure that first needs to be made even taller and then can be converted to wide. Basically you run two PROC TRANSPOSE steps with perhaps some steps in the middle to help generate variable names or set order of variable creation.
Someone else posted a very similar question recently, maybe you can leverage their sample data?
In general, you may need to do several transposes/merges but it depends really on your data. The posts below cover transposing fairly thoroughly, the fastest programming option is likely splitting them into multiple data sets and merging, while the fastest from a processing standpoint will likely be a data step transpose in a single step. If you can pre-calculate the size of the arrays needed this is definitely the way to go.
Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd
data Have; input Month $ Policy_Number $ IRDA_Amt OpsRefNo $10.; datalines; Nov-2020 XYZ 1000 20201005-M Nov-2020 XYZ 500 20201015-A Nov-2020 XYZ 100 20201222-A ; run; data have; set have; by Policy_Number; if first.Policy_Number then n=0; n+1; run; proc sql noprint nowarn; select distinct catt('have(where=(n=',n,' ) rename=(IRDA_Amt=IRDA_Amt_',n,' OpsRefNo=OpsRefNo_',n,'))') into : merge separated by ' ' from have order by n; quit; data want; merge &merge ; by Policy_Number; drop Month n; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.