I have follwoing dataset.
data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. reg_dates :MMDDYY10. test_dates :MMDDYY10.;
datalines;
abc,010,01/10/2017,01/20/2017
abc,010,01/10/2017,05/12/2017
abc,010,08/01/2019,08/13/2019
abc,010,10/02/2018,10/09/2018
def,023,02/11/2017,02/14/2017
def,023,07/03/2016,07/30/2016
def,023,11/20/2017,11/26/2017
def,023,11/20/2017,04/20/2018
def,023,06/02/2017,07/02/2017
pqr,125,05/10/2013,05/20/2013
xyz,200,09/12/2015,09/18/2015
xyz,200,01/12/2016,11/22/2017
xyz,200,01/12/2016,01/21/2016
;
run;
Please note that most of the subjects have multiple reg_dates and test_dates. How can I split the dates into multiple column. Like, reg_date1, reg_date2, reg_date3 . . . and also test_date1, test_date2 and so on.
So, the column of the output dataset will be like below.
name id reg_date1 reg_date2 reg_date3 reg_date4 reg_date5 test_date1 test_date2 test_date3 test_date4 test_date5
Ultimately one subject will have one row.
Thanks,
data want;
set have;
by name;
array regdt{5} reg_date1-reg_date5;
array testdt{5} test_date1-test_date5;
retain reg_date: test_date:;
format reg_date: test_date: mmddyys10.;
if first.name then counter=1;
regdt{counter}=reg_dates;
testdt{counter}=test_dates;
if last.name then do;
output;
/* Added these call missing steps */
call missing(of reg_date1-reg_date5);
call missing (of test_date1-test_date5);
end;
counter+1;
drop reg_dates test_dates counter;
run;
data want;
set have;
by name;
array regdt{5} reg_date1-reg_date5;
array testdt{5} test_date1-test_date5;
retain reg_date: test_date:;
format reg_date: test_date: mmddyys10.;
if first.name then counter=1;
regdt{counter}=reg_dates;
testdt{counter}=test_dates;
if last.name then do;
output;
/* Added these call missing steps */
call missing(of reg_date1-reg_date5);
call missing (of test_date1-test_date5);
end;
counter+1;
drop reg_dates test_dates counter;
run;
Thanks Ketpt42,
I was wondering why the missing dates of the subject pqr copied from the data just above. Same to xyz.
Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
You're asking for a transposed format. You can either use PROC TRANSPOSE or a data step to accomplish this. @ketpt42 has provided the data step approach. The Transpose and Merge approach is less efficient - you need two transpose and one merge but is more dynamic so each method has its pros and cons.
The link above illustrates both methods.
May I ask a followup question, please.
I was wondering if I have a column (comment) with text data in the same dataset then how do I split them into different columns (comment1, comment2, comment3, comment4, comment5)? I tried for long time, but failed. See the modified dataset below-
data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. reg_dates :MMDDYY10. test_dates :MMDDYY10. comment :$100.;
datalines;
abc,010,01/10/2017,01/20/2017,transferred to another hospital
abc,010,01/10/2017,05/12/2017,case has been closed
abc,010,08/01/2019,08/13/2019,unable to followup
abc,010,10/02/2018,10/09/2018,admin closure
def,023,02/11/2017,02/14/2017,transferred to another hospital
def,023,07/03/2016,07/30/2016,unable to followup
def,023,11/20/2017,11/26/2017,case has been closed
def,023,11/20/2017,04/20/2018,unable to followup
def,023,06/02/2017,07/02/2017,unable to followup
pqr,125,05/10/2013,05/20/2013,transferred to another hospital
xyz,200,09/12/2015,09/18/2015,case has been closed
xyz,200,01/12/2016,11/22/2017,transferred to another hospital
xyz,200,01/12/2016,01/21/2016,unable to followup
;
run;
I tried the following program. But I have no idea, why the values for coment1-comment5 are missing in the want dataset.
data want;
set have;
by name;
array regdt{5} reg_date1-reg_date5;
array testdt{5} test_date1-test_date5;
array cmnt{5} comment1-comment5;
retain reg_date: test_date: comment:;
format reg_date: test_date: mmddyys10.;
format comment $50.;
if first.name then counter=1;
regdt{counter}=reg_dates;
testdt{counter}=test_dates;
cmnt{counter}=comment;
if last.name then do;
output;
/* Added these call missing steps */
call missing (of reg_date1-reg_date5);
call missing (of test_date1-test_date5);
call missing (of comment1-comment5);
end;
counter+1;
drop reg_dates test_dates counter comment;
*putlog _all_;
run;
proc print data=want; run;
You need to declare your comment array as character.
data want;
set have;
by name;
array regdt{5} reg_date1-reg_date5;
array testdt{5} test_date1-test_date5;
array cmnt{5} $100 comment1-comment5;
retain reg_date: test_date: comment:;
format reg_date: test_date: mmddyys10. comment: $100.;
if first.name then counter=1;
regdt{counter}=reg_dates;
testdt{counter}=test_dates;
cmnt{counter}=comment;
if last.name then do;
output;
/* Added these call missing steps */
call missing(of reg_date1-reg_date5);
call missing (of test_date1-test_date5);
call missing (of comment1-comment5);
end;
counter+1;
drop reg_dates test_dates counter comment;
run;
Or expand @Jagadishkatam's example into a triple transpose:
proc transpose data=have out=want1(drop=_name_) prefix=reg_date;
by name id;
var reg_dates ;
run;
proc transpose data=have out=want2(drop=_name_) prefix=test_date;
by name id;
var test_dates ;
run;
proc transpose data=have out=want3(drop=_name_) prefix=comment;
by name id;
var comment ;
run;
data want;
merge want1 want2 want3;
by name id;
run;
Alternatively
proc transpose data=have out=want1(drop=_name_) prefix=reg_dates ;
by name id;
var reg_dates ;
run;
proc transpose data=have out=want2(drop=_name_) prefix=test_dates;
by name id;
var test_dates ;
run;
data want;
merge want1 want2;
by name id;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.