I need help with transposing some datasets. I have multiple datasets that I need transposed by some ID variables.
Below is a sample of one dataset and the transpose procedure I am attempting to use.
data DSIS2;
input STUDYID $12. +2 SITEID $3. +2 SUBJID $11. +2 STDYPRD 1. +2 PNINTSLP 1. +2 OPENDATE date9. +2 OPENTIME time5. +2 SAVEDATE date9. +2 SAVETIME time5.;
format OPENDATE SAVEDATE date9. OPENTIME SAVETIME time5.;
cards;
VX16-150-102 022 102-022-002 1 4 12DEC2017 10:19 12DEC2017 10:19
VX16-150-102 022 102-022-002 1 5 13DEC2017 10:01 13DEC2017 10:01
VX16-150-102 022 102-022-002 1 8 14DEC2017 10:01 14DEC2017 10:01
VX16-150-102 022 102-022-002 1 5 15DEC2017 10:01 15DEC2017 10:02
VX16-150-102 022 102-022-002 1 7 16DEC2017 10:00 16DEC2017 10:01
VX16-150-102 022 102-022-002 1 4 17DEC2017 10:00 17DEC2017 10:00
VX16-150-102 022 102-022-002 1 6 18DEC2017 10:01 18DEC2017 10:01
VX16-150-102 022 102-022-002 1 7 19DEC2017 05:56 19DEC2017 05:56
;
run;
proc transpose data=DSIS2 out=DSIS2_T;
by studyid siteid subjid;
var stdyprd pnintslp opendate opentime savedate savetime;
run;
My goal is to get something that looks like this (spacing between groups just for demonstration--I don't actually need a blank line between groups):
VX16-150-102 022 102-022-002 STDYPRD 1
VX16-150-102 022 102-022-002 PNINTSLP 4
VX16-150-102 022 102-022-002 OPENDATE 12DEC2017
VX16-150-102 022 102-022-002 OPENTIME 10:19
VX16-150-102 022 102-022-002 SAVEDATE 12DEC2017
VX16-150-102 022 102-022-002 SAVETIME 10:19
VX16-150-102 022 102-022-002 STDYPRD 1
VX16-150-102 022 102-022-002 PNINTSLP 5
VX16-150-102 022 102-022-002 OPENDATE 13DEC2017
VX16-150-102 022 102-022-002 OPENTIME 10:01
VX16-150-102 022 102-022-002 SAVEDATE 13DEC2017
VX16-150-102 022 102-022-002 SAVETIME 10:01
VX16-150-102 022 102-022-002 STDYPRD 1
VX16-150-102 022 102-022-002 PNINTSLP 6
VX16-150-102 022 102-022-002 OPENDATE 14DEC2017
VX16-150-102 022 102-022-002 OPENTIME 10:01
VX16-150-102 022 102-022-002 SAVEDATE 14DEC2017
VX16-150-102 022 102-022-002 SAVETIME 10:01
VX16-150-102 022 102-022-002 STDYPRD 1
VX16-150-102 022 102-022-002 PNINTSLP 5
VX16-150-102 022 102-022-002 OPENDATE 15DEC2017
VX16-150-102 022 102-022-002 OPENTIME 10:01
VX16-150-102 022 102-022-002 SAVEDATE 15DEC2017
VX16-150-102 022 102-022-002 SAVETIME 10:02
etc...
When you open DSIS2_T, you'll see that I actually get a new column for each row (which I understand is what transposing does). But I basically want a stacked transposition. Is there a shortcut way to do this rather than looping through and setting the dataset and only keeping the ID variables and the new variable of interest?
As an added caveat, notice that the date and time variables lose their formatting. I want to keep the formatting. Also, I have several datasets that I am trying to transpose and then stack on top of each other. In this case, all the variables are numeric, so they come out as numeric columns, but some datasets may contain character variables, in which case I want to make all the columns character. Is this possible in a shortened way?
My current process is:
There must be a simpler way than this!
@djbateman wrote:
Thank you. I think I can deal with converting all numeric variables to character before transposing. That will certainly clear up the second issue.
The data will be used to see where missing values are contained. We have a spec that describes the dataset design. We want to merge the design with the actual data to see if we are missing any values that expected in the database. I hope that makes sense.
If you add a character variable (dummy perhaps) to the VAR list in PROC TRANSPOSE SAS will do the conversion to numeric for you using any formats associated with the variables. I call it VVALUE en masse.
The desired result you show is not possible for all numeric values. Reason: only one format may be applied to a variable. Your appearance for your example data indicates a desire for multiple formats. At which point the process might be: determine the numeric variables, create a text version and then use a process that works for character variables.
Example:
data DSIS2; input STUDYID $12. +2 SITEID $3. +2 SUBJID $11. +2 STDYPRD $1. +2 PNINTSLP $1. +2 OPENDATE $9. +2 OPENTIME $5. +2 SAVEDATE $9. +2 SAVETIME $5.; cards; VX16-150-102 022 102-022-002 1 4 12DEC2017 10:19 12DEC2017 10:19 VX16-150-102 022 102-022-002 1 5 13DEC2017 10:01 13DEC2017 10:01 VX16-150-102 022 102-022-002 1 8 14DEC2017 10:01 14DEC2017 10:01 VX16-150-102 022 102-022-002 1 5 15DEC2017 10:01 15DEC2017 10:02 VX16-150-102 022 102-022-002 1 7 16DEC2017 10:00 16DEC2017 10:01 VX16-150-102 022 102-022-002 1 4 17DEC2017 10:00 17DEC2017 10:00 VX16-150-102 022 102-022-002 1 6 18DEC2017 10:01 18DEC2017 10:01 VX16-150-102 022 102-022-002 1 7 19DEC2017 05:56 19DEC2017 05:56 ; run; data DSIS2_t; set DSIS2; array vars stdyprd pnintslp opendate opentime savedate savetime; length varname $ 32 value $ 10; do i= 1 to dim(vars); varname = vname(vars[i]); value = vars[i]; output; end; keep studyid siteid subjid varname value; run;
Please describe how the resulting data is to be used. Almost any analysis with that data structure will be extremely difficult at best.
If the only reason you want that appearance is for a report then a report procedure may be of more use.
@djbateman wrote:
Thank you. I think I can deal with converting all numeric variables to character before transposing. That will certainly clear up the second issue.
The data will be used to see where missing values are contained. We have a spec that describes the dataset design. We want to merge the design with the actual data to see if we are missing any values that expected in the database. I hope that makes sense.
If you add a character variable (dummy perhaps) to the VAR list in PROC TRANSPOSE SAS will do the conversion to numeric for you using any formats associated with the variables. I call it VVALUE en masse.
@djbateman wrote:
Thanks. That's an excellent idea! Now I just need to see if I can transpose in a stacking manner. It seems my best bet is to simply stack via a loop rather than trying to transpose.
Here is the code I was thinking of. Since you need transpose by OBS but your BY variables are not unique I created a RECID to get the output you show as DSIS2_T.
data DSIS2;
input STUDYID $12. +2 SITEID $3. +2 SUBJID $11. +2 STDYPRD 1. +2 PNINTSLP 1. +2 OPENDATE date9. +2 OPENTIME time5. +2 SAVEDATE date9. +2 SAVETIME time5.;
format OPENDATE SAVEDATE date9. OPENTIME SAVETIME time5.;
retain dummy '1';
recid = _n_;
cards;
VX16-150-102 022 102-022-002 1 4 12DEC2017 10:19 12DEC2017 10:19
VX16-150-102 022 102-022-002 1 5 13DEC2017 10:01 13DEC2017 10:01
VX16-150-102 022 102-022-002 1 8 14DEC2017 10:01 14DEC2017 10:01
VX16-150-102 022 102-022-002 1 5 15DEC2017 10:01 15DEC2017 10:02
VX16-150-102 022 102-022-002 1 7 16DEC2017 10:00 16DEC2017 10:01
VX16-150-102 022 102-022-002 1 4 17DEC2017 10:00 17DEC2017 10:00
VX16-150-102 022 102-022-002 1 6 18DEC2017 10:01 18DEC2017 10:01
VX16-150-102 022 102-022-002 1 7 19DEC2017 05:56 19DEC2017 05:56
;
run;
proc transpose data=DSIS2 out=DSIS2_T(where=(upcase(_name_) ne 'DUMMY'));
by studyid siteid subjid recid;
var dummy stdyprd pnintslp opendate opentime savedate savetime;
run;
proc print;
run;
Rather than transpose the dataset I would use PROC FREQ to list the values in the data and use ODS to capture that into a single table. Then you can compare it against your master list, without having to transform the original data.
@djbateman wrote:
The data will be used to see where missing values are contained. We have a spec that describes the dataset design. We want to merge the design with the actual data to see if we are missing any values that expected in the database. I hope that makes sense.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.