I have follwoing dataset.
data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. reg_dates :MMDDYY10. test_dates :MMDDYY10.;
datalines;
abc,010,01/10/2017,01/20/2017
abc,010,01/10/2017,05/12/2017
abc,010,08/01/2019,08/13/2019
abc,010,10/02/2018,10/09/2018
def,023,02/11/2017,02/14/2017
def,023,07/03/2016,07/30/2016
def,023,11/20/2017,11/26/2017
def,023,11/20/2017,04/20/2018
def,023,06/02/2017,07/02/2017
pqr,125,05/10/2013,05/20/2013
xyz,200,09/12/2015,09/18/2015
xyz,200,01/12/2016,11/22/2017
xyz,200,01/12/2016,01/21/2016
;
run;
Please note that most of the subjects have multiple reg_dates and test_dates. How can I split the dates into multiple column. Like, reg_date1, reg_date2, reg_date3 . . . and also test_date1, test_date2 and so on.
So, the column of the output dataset will be like below.
name id reg_date1 reg_date2 reg_date3 reg_date4 reg_date5 test_date1 test_date2 test_date3 test_date4 test_date5
Ultimately one subject will have one row.
Thanks,
data want;
set have;
by name;
array regdt{5} reg_date1-reg_date5;
array testdt{5} test_date1-test_date5;
retain reg_date: test_date:;
format reg_date: test_date: mmddyys10.;
if first.name then counter=1;
regdt{counter}=reg_dates;
testdt{counter}=test_dates;
if last.name then do;
output;
/* Added these call missing steps */
call missing(of reg_date1-reg_date5);
call missing (of test_date1-test_date5);
end;
counter+1;
drop reg_dates test_dates counter;
run;
data want;
set have;
by name;
array regdt{5} reg_date1-reg_date5;
array testdt{5} test_date1-test_date5;
retain reg_date: test_date:;
format reg_date: test_date: mmddyys10.;
if first.name then counter=1;
regdt{counter}=reg_dates;
testdt{counter}=test_dates;
if last.name then do;
output;
/* Added these call missing steps */
call missing(of reg_date1-reg_date5);
call missing (of test_date1-test_date5);
end;
counter+1;
drop reg_dates test_dates counter;
run;
Thanks Ketpt42,
I was wondering why the missing dates of the subject pqr copied from the data just above. Same to xyz.
Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
You're asking for a transposed format. You can either use PROC TRANSPOSE or a data step to accomplish this. @ketpt42 has provided the data step approach. The Transpose and Merge approach is less efficient - you need two transpose and one merge but is more dynamic so each method has its pros and cons.
The link above illustrates both methods.
May I ask a followup question, please.
I was wondering if I have a column (comment) with text data in the same dataset then how do I split them into different columns (comment1, comment2, comment3, comment4, comment5)? I tried for long time, but failed. See the modified dataset below-
data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. reg_dates :MMDDYY10. test_dates :MMDDYY10. comment :$100.;
datalines;
abc,010,01/10/2017,01/20/2017,transferred to another hospital
abc,010,01/10/2017,05/12/2017,case has been closed
abc,010,08/01/2019,08/13/2019,unable to followup
abc,010,10/02/2018,10/09/2018,admin closure
def,023,02/11/2017,02/14/2017,transferred to another hospital
def,023,07/03/2016,07/30/2016,unable to followup
def,023,11/20/2017,11/26/2017,case has been closed
def,023,11/20/2017,04/20/2018,unable to followup
def,023,06/02/2017,07/02/2017,unable to followup
pqr,125,05/10/2013,05/20/2013,transferred to another hospital
xyz,200,09/12/2015,09/18/2015,case has been closed
xyz,200,01/12/2016,11/22/2017,transferred to another hospital
xyz,200,01/12/2016,01/21/2016,unable to followup
;
run;
I tried the following program. But I have no idea, why the values for coment1-comment5 are missing in the want dataset.
data want;
set have;
by name;
array regdt{5} reg_date1-reg_date5;
array testdt{5} test_date1-test_date5;
array cmnt{5} comment1-comment5;
retain reg_date: test_date: comment:;
format reg_date: test_date: mmddyys10.;
format comment $50.;
if first.name then counter=1;
regdt{counter}=reg_dates;
testdt{counter}=test_dates;
cmnt{counter}=comment;
if last.name then do;
output;
/* Added these call missing steps */
call missing (of reg_date1-reg_date5);
call missing (of test_date1-test_date5);
call missing (of comment1-comment5);
end;
counter+1;
drop reg_dates test_dates counter comment;
*putlog _all_;
run;
proc print data=want; run;
You need to declare your comment array as character.
data want;
set have;
by name;
array regdt{5} reg_date1-reg_date5;
array testdt{5} test_date1-test_date5;
array cmnt{5} $100 comment1-comment5;
retain reg_date: test_date: comment:;
format reg_date: test_date: mmddyys10. comment: $100.;
if first.name then counter=1;
regdt{counter}=reg_dates;
testdt{counter}=test_dates;
cmnt{counter}=comment;
if last.name then do;
output;
/* Added these call missing steps */
call missing(of reg_date1-reg_date5);
call missing (of test_date1-test_date5);
call missing (of comment1-comment5);
end;
counter+1;
drop reg_dates test_dates counter comment;
run;
Or expand @Jagadishkatam's example into a triple transpose:
proc transpose data=have out=want1(drop=_name_) prefix=reg_date;
by name id;
var reg_dates ;
run;
proc transpose data=have out=want2(drop=_name_) prefix=test_date;
by name id;
var test_dates ;
run;
proc transpose data=have out=want3(drop=_name_) prefix=comment;
by name id;
var comment ;
run;
data want;
merge want1 want2 want3;
by name id;
run;
Alternatively
proc transpose data=have out=want1(drop=_name_) prefix=reg_dates ;
by name id;
var reg_dates ;
run;
proc transpose data=have out=want2(drop=_name_) prefix=test_dates;
by name id;
var test_dates ;
run;
data want;
merge want1 want2;
by name id;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.