Solved: Re: Split single colum data into multiple column

Barkat · Posted 07-07-2020 08:36 PM

I have follwoing dataset.

data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. reg_dates :MMDDYY10. test_dates :MMDDYY10.;
datalines;
abc,010,01/10/2017,01/20/2017
abc,010,01/10/2017,05/12/2017
abc,010,08/01/2019,08/13/2019
abc,010,10/02/2018,10/09/2018
def,023,02/11/2017,02/14/2017
def,023,07/03/2016,07/30/2016
def,023,11/20/2017,11/26/2017
def,023,11/20/2017,04/20/2018
def,023,06/02/2017,07/02/2017
pqr,125,05/10/2013,05/20/2013
xyz,200,09/12/2015,09/18/2015
xyz,200,01/12/2016,11/22/2017
xyz,200,01/12/2016,01/21/2016
;
run;

Please note that most of the subjects have multiple reg_dates and test_dates. How can I split the dates into multiple column. Like, reg_date1, reg_date2, reg_date3 . . . and also test_date1, test_date2 and so on.

So, the column of the output dataset will be like below.

name id reg_date1 reg_date2 reg_date3 reg_date4 reg_date5 test_date1 test_date2 test_date3 test_date4 test_date5

Ultimately one subject will have one row.

Thanks,

ketpt42 · Posted 07-07-2020 08:52 PM

data want;
    set have;
    by name;
    array regdt{5} reg_date1-reg_date5;
    array testdt{5} test_date1-test_date5;
    
    retain reg_date: test_date:;
    format reg_date: test_date: mmddyys10.;
    
    if first.name then counter=1;
    regdt{counter}=reg_dates;
    testdt{counter}=test_dates;
    if last.name then do;
        output;
        /* Added these call missing steps */
        call missing(of reg_date1-reg_date5);
        call missing (of test_date1-test_date5);
        end;
    
    counter+1;
    drop reg_dates test_dates counter;
run;

View solution in original post

ketpt42 · Posted 07-07-2020 08:52 PM

data want;
    set have;
    by name;
    array regdt{5} reg_date1-reg_date5;
    array testdt{5} test_date1-test_date5;
    
    retain reg_date: test_date:;
    format reg_date: test_date: mmddyys10.;
    
    if first.name then counter=1;
    regdt{counter}=reg_dates;
    testdt{counter}=test_dates;
    if last.name then do;
        output;
        /* Added these call missing steps */
        call missing(of reg_date1-reg_date5);
        call missing (of test_date1-test_date5);
        end;
    
    counter+1;
    drop reg_dates test_dates counter;
run;

Barkat · Posted 07-07-2020 09:16 PM

Thanks Ketpt42,

I was wondering why the missing dates of the subject pqr copied from the data just above. Same to xyz.

ketpt42 · Posted 07-07-2020 09:37 PM

Oh, I forgot something. Let me modify the original code...

Reeza · Posted 07-07-2020 09:18 PM

Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

You're asking for a transposed format. You can either use PROC TRANSPOSE or a data step to accomplish this. @ketpt42 has provided the data step approach. The Transpose and Merge approach is less efficient - you need two transpose and one merge but is more dynamic so each method has its pros and cons.

The link above illustrates both methods.

Barkat · Posted 07-10-2020 11:56 AM

May I ask a followup question, please.

I was wondering if I have a column (comment) with text data in the same dataset then how do I split them into different columns (comment1, comment2, comment3, comment4, comment5)? I tried for long time, but failed. See the modified dataset below-

data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. reg_dates :MMDDYY10. test_dates :MMDDYY10. comment :$100.;
datalines;
abc,010,01/10/2017,01/20/2017,transferred to another hospital
abc,010,01/10/2017,05/12/2017,case has been closed
abc,010,08/01/2019,08/13/2019,unable to followup
abc,010,10/02/2018,10/09/2018,admin closure
def,023,02/11/2017,02/14/2017,transferred to another hospital
def,023,07/03/2016,07/30/2016,unable to followup
def,023,11/20/2017,11/26/2017,case has been closed
def,023,11/20/2017,04/20/2018,unable to followup
def,023,06/02/2017,07/02/2017,unable to followup
pqr,125,05/10/2013,05/20/2013,transferred to another hospital
xyz,200,09/12/2015,09/18/2015,case has been closed
xyz,200,01/12/2016,11/22/2017,transferred to another hospital
xyz,200,01/12/2016,01/21/2016,unable to followup
;
run;

Barkat · Posted 07-10-2020 03:47 PM

I tried the following program. But I have no idea, why the values for coment1-comment5 are missing in the want dataset.

data want;
set have;
by name;
array regdt{5} reg_date1-reg_date5;
array testdt{5} test_date1-test_date5;
array cmnt{5} comment1-comment5;

retain reg_date: test_date: comment:;
format reg_date: test_date: mmddyys10.;
format comment $50.;

if first.name then counter=1;
regdt{counter}=reg_dates;
testdt{counter}=test_dates;
cmnt{counter}=comment;

if last.name then do;
output;
/* Added these call missing steps */
call missing (of reg_date1-reg_date5);
call missing (of test_date1-test_date5);
call missing (of comment1-comment5);
end;
counter+1;
drop reg_dates test_dates counter comment;
*putlog _all_;
run;

proc print data=want; run;

ketpt42 · Posted 07-10-2020 03:57 PM

You need to declare your comment array as character.

data want;
    set have;
    by name;
    array regdt{5} reg_date1-reg_date5;
    array testdt{5} test_date1-test_date5;
    array cmnt{5} $100 comment1-comment5;
    
    
    retain reg_date: test_date: comment:;
    format reg_date: test_date: mmddyys10. comment: $100.;
    
    if first.name then counter=1;
    regdt{counter}=reg_dates;
    testdt{counter}=test_dates;
    cmnt{counter}=comment;
    if last.name then do;
        output;
        /* Added these call missing steps */
        call missing(of reg_date1-reg_date5);
        call missing (of test_date1-test_date5);
        call missing (of comment1-comment5);
        end;
        
    counter+1;
    drop reg_dates test_dates counter comment;
run;

ketpt42 · Posted 07-10-2020 04:01 PM

Or expand @Jagadishkatam's example into a triple transpose:

proc transpose data=have out=want1(drop=_name_) prefix=reg_date;
by name id;
var reg_dates ;
run;

proc transpose data=have out=want2(drop=_name_) prefix=test_date;
by name id;
var test_dates ;
run;

proc transpose data=have out=want3(drop=_name_) prefix=comment;
by name id;
var comment ;
run;

data want;
merge want1 want2 want3;
by name id;
run;

Jagadishkatam · Posted 07-08-2020 04:15 AM

Alternatively

proc transpose data=have out=want1(drop=_name_) prefix=reg_dates ;
by name id;
var reg_dates ;
run;

proc transpose data=have out=want2(drop=_name_) prefix=test_dates;
by name id;
var test_dates ;
run;

data want;
merge want1 want2;
by name id;
run;

Thanks,
Jag

Ready to join fellow brilliant minds for the SAS Hackathon?