BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Barkat
Pyrite | Level 9

I have follwoing dataset.

 

data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. reg_dates :MMDDYY10. test_dates :MMDDYY10.;
datalines;
abc,010,01/10/2017,01/20/2017
abc,010,01/10/2017,05/12/2017
abc,010,08/01/2019,08/13/2019
abc,010,10/02/2018,10/09/2018
def,023,02/11/2017,02/14/2017
def,023,07/03/2016,07/30/2016
def,023,11/20/2017,11/26/2017
def,023,11/20/2017,04/20/2018
def,023,06/02/2017,07/02/2017
pqr,125,05/10/2013,05/20/2013
xyz,200,09/12/2015,09/18/2015
xyz,200,01/12/2016,11/22/2017
xyz,200,01/12/2016,01/21/2016
;
run;

 

Please note that most of the subjects have multiple reg_dates and test_dates. How can I split the dates into multiple column. Like, reg_date1, reg_date2, reg_date3 . . . and also test_date1, test_date2 and so on.

So, the column of the output dataset will be like below.

name   id   reg_date1   reg_date2   reg_date3   reg_date4   reg_date5   test_date1   test_date2   test_date3   test_date4   test_date5

 

Ultimately one subject will have one row.

 

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
ketpt42
Quartz | Level 8
data want;
    set have;
    by name;
    array regdt{5} reg_date1-reg_date5;
    array testdt{5} test_date1-test_date5;
    
    retain reg_date: test_date:;
    format reg_date: test_date: mmddyys10.;
    
    if first.name then counter=1;
    regdt{counter}=reg_dates;
    testdt{counter}=test_dates;
    if last.name then do;
        output;
        /* Added these call missing steps */
        call missing(of reg_date1-reg_date5);
        call missing (of test_date1-test_date5);
        end;
    
    counter+1;
    drop reg_dates test_dates counter;
run;

ketpt42_0-1594172496285.png

 

View solution in original post

9 REPLIES 9
ketpt42
Quartz | Level 8
data want;
    set have;
    by name;
    array regdt{5} reg_date1-reg_date5;
    array testdt{5} test_date1-test_date5;
    
    retain reg_date: test_date:;
    format reg_date: test_date: mmddyys10.;
    
    if first.name then counter=1;
    regdt{counter}=reg_dates;
    testdt{counter}=test_dates;
    if last.name then do;
        output;
        /* Added these call missing steps */
        call missing(of reg_date1-reg_date5);
        call missing (of test_date1-test_date5);
        end;
    
    counter+1;
    drop reg_dates test_dates counter;
run;

ketpt42_0-1594172496285.png

 

Barkat
Pyrite | Level 9

Thanks Ketpt42,

I was wondering why the missing dates of the subject pqr copied from the data just above. Same to xyz.

ketpt42
Quartz | Level 8
Oh, I forgot something. Let me modify the original code...
Reeza
Super User

Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

You're asking for a transposed format. You can either use PROC TRANSPOSE or a data step to accomplish this. @ketpt42 has provided the data step approach. The Transpose and Merge approach is less efficient - you need two transpose and one merge but is more dynamic so each method has its pros and cons. 

 

The link above illustrates both methods. 

Barkat
Pyrite | Level 9

May I ask a followup question, please. 

I was wondering if I have a column (comment) with text data in the same dataset then how do I split them into different columns (comment1, comment2, comment3, comment4, comment5)? I tried for long time, but failed. See the modified dataset below-

 

data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. reg_dates :MMDDYY10. test_dates :MMDDYY10. comment :$100.;
datalines;
abc,010,01/10/2017,01/20/2017,transferred to another hospital
abc,010,01/10/2017,05/12/2017,case has been closed
abc,010,08/01/2019,08/13/2019,unable to followup
abc,010,10/02/2018,10/09/2018,admin closure
def,023,02/11/2017,02/14/2017,transferred to another hospital
def,023,07/03/2016,07/30/2016,unable to followup
def,023,11/20/2017,11/26/2017,case has been closed
def,023,11/20/2017,04/20/2018,unable to followup
def,023,06/02/2017,07/02/2017,unable to followup
pqr,125,05/10/2013,05/20/2013,transferred to another hospital
xyz,200,09/12/2015,09/18/2015,case has been closed
xyz,200,01/12/2016,11/22/2017,transferred to another hospital
xyz,200,01/12/2016,01/21/2016,unable to followup
;
run;

Barkat
Pyrite | Level 9

I tried the following program. But I have no idea, why the values for coment1-comment5 are missing in the want dataset.

 

 

data want;
set have;
by name;
array regdt{5} reg_date1-reg_date5;
array testdt{5} test_date1-test_date5;
array cmnt{5} comment1-comment5;

retain reg_date: test_date: comment:;
format reg_date: test_date: mmddyys10.;
format comment $50.;

if first.name then counter=1;
regdt{counter}=reg_dates;
testdt{counter}=test_dates;
cmnt{counter}=comment;

if last.name then do;
output;
/* Added these call missing steps */
call missing (of reg_date1-reg_date5);
call missing (of test_date1-test_date5);
call missing (of comment1-comment5);
end;
counter+1;
drop reg_dates test_dates counter comment;
*putlog _all_;
run;

proc print data=want; run;

ketpt42
Quartz | Level 8

You need to declare your comment array as character.

 

data want;
    set have;
    by name;
    array regdt{5} reg_date1-reg_date5;
    array testdt{5} test_date1-test_date5;
    array cmnt{5} $100 comment1-comment5;
    
    
    retain reg_date: test_date: comment:;
    format reg_date: test_date: mmddyys10. comment: $100.;
    
    if first.name then counter=1;
    regdt{counter}=reg_dates;
    testdt{counter}=test_dates;
    cmnt{counter}=comment;
    if last.name then do;
        output;
        /* Added these call missing steps */
        call missing(of reg_date1-reg_date5);
        call missing (of test_date1-test_date5);
        call missing (of comment1-comment5);
        end;
        
    counter+1;
    drop reg_dates test_dates counter comment;
run;
ketpt42
Quartz | Level 8

Or expand @Jagadishkatam's example into a triple transpose:

 

proc transpose data=have out=want1(drop=_name_) prefix=reg_date;
by name id;
var reg_dates ;
run;

proc transpose data=have out=want2(drop=_name_) prefix=test_date;
by name id;
var test_dates ;
run;

proc transpose data=have out=want3(drop=_name_) prefix=comment;
by name id;
var comment ;
run;

data want;
merge want1 want2 want3;
by name id;
run;
Jagadishkatam
Amethyst | Level 16

Alternatively

 

proc transpose data=have out=want1(drop=_name_) prefix=reg_dates ;
by name id;
var reg_dates ;
run;

proc transpose data=have out=want2(drop=_name_) prefix=test_dates;
by name id;
var test_dates ;
run;

data want;
merge want1 want2;
by name id;
run;
Thanks,
Jag

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1088 views
  • 7 likes
  • 4 in conversation