how to create the last 3 columns based on the first 5 columns information?
1. Medicine days = medicine date - LS_DT, set to missing once the treatment A begins.
2. Treatment A days = Medicine date - Treatment A start Date untill the 2nd treatment A starts. For example,
There are 2 Treatment A. The first one starts on August 24th, so Treatment A days is starting from 0, until 5. The 2nd treatment A
starts on August 30th, so the treatment A days resets and starts from 0 again until 2.
3. Treatment B days= Medicine date - Treatment B start Date untill the end of medicine date. For example,
the Treatment B starts on Sep 2nd, so Treatment B days is starting from 0, until 2.
Patient ID | Medicine date | LS_DT | Treatment A Date | Treatment B Date | Medine days | treatment A days | treatment B days |
1 | 18-Aug | 07/03/2015 | 46 | ||||
1 | 19-Aug | 07/03/2015 | 47 | ||||
1 | 20-Aug | 07/03/2015 | 48 | ||||
1 | 21-Aug | 07/03/2015 | 49 | ||||
1 | 22-Aug | 07/03/2015 | 50 | ||||
1 | 23-Aug | 07/03/2015 | 51 | ||||
1 | 24-Aug | 07/03/2015 | 24-Aug | 0 | |||
1 | 25-Aug | 07/03/2015 | 1 | ||||
1 | 26-Aug | 07/03/2015 | 2 | ||||
1 | 27-Aug | 07/03/2015 | 3 | ||||
1 | 28-Aug | 07/03/2015 | 4 | ||||
1 | 29-Aug | 07/03/2015 | 5 | ||||
1 | 30-Aug | 07/03/2015 | 30-Aug | 0 | |||
1 | 31-Aug | 07/03/2015 | 1 | ||||
1 | 1-Sep | 07/03/2015 | 2 | ||||
1 | 2-Sep | 07/03/2015 | 2-Sep | 0 | |||
1 | 3-Sep | 07/03/2015 | 1 | ||||
1 | 4-Sep | 07/03/2015 | 2 | ||||
2 | 31-Aug | 07/05/2015 | 31-Aug | 57 | 0 | ||
2 | 1-Sep | 07/05/2015 | 1 | ||||
2 | 2-Sep | 07/05/2015 | 2 | ||||
2 | 3-Sep | 07/05/2015 | 3-Sep | 0 | |||
2 | 4-Sep | 07/05/2015 | 1 |
data have;
infile cards expandtabs truncover;
input PatientID Medicinedate $ LS_DT : mmddyy10. TreatmentA $ treatmentB $;
date=input(cats(Medicinedate,'-2015'),date11.);
format date ls_dt mmddyy10.;
cards;
1 18-Aug 07/03/2015 . . 46
1 19-Aug 07/03/2015 . . 47
1 20-Aug 07/03/2015 . . 48
1 21-Aug 07/03/2015 . . 49
1 22-Aug 07/03/2015 . . 50
1 23-Aug 07/03/2015 . . 51
1 24-Aug 07/03/2015 24-Aug . 0
1 25-Aug 07/03/2015 . . 1
1 26-Aug 07/03/2015 . . 2
1 27-Aug 07/03/2015 . . 3
1 28-Aug 07/03/2015 . . 4
1 29-Aug 07/03/2015 . . 5
1 30-Aug 07/03/2015 30-Aug . 0
1 31-Aug 07/03/2015 . . 1
1 1-Sep 07/03/2015 . . 2
1 2-Sep 07/03/2015 . 2-Sep 0
1 3-Sep 07/03/2015 . . 1
1 4-Sep 07/03/2015 . . 2
2 31-Aug 07/05/2015 31-Aug . 57 0
2 1-Sep 07/05/2015 . . 1
2 2-Sep 07/05/2015 . . 2
2 3-Sep 07/05/2015 . 3-Sep 0
2 4-Sep 07/05/2015 . .
;
run;
data want;
set have;
by PatientID;
retain a b count;
if first.PatientID then do;
call missing(a,b,count);
end;
if not a and not b then Medinedays=date-ls_dt;
if not missing(TreatmentA) then do;a=1;count=-1;end;
if not missing(treatmentB) then do;b=1;count=-1;end;
count+1;
if a and not b then treatmentAdays=count;
if a and b then treatmentBdays=count;
drop a b count date;
run;
proc print noobs;run;
How did you get those first 5 columns? I suspect the data was originally not like that, but it's not conducive to answering your question easily. It would be much easier if the treatment A start date was at the top for example.
What format are your dates in? The Medicine date doesn't have a year? Is that correct? What happens when a treatment A date is one year and the LST date is the previous year? How can that be differentiated?
Do you really need three new columns? Why not create one new column with a treatment type and a second with duration? It'll likely be easier to work with further on in this format.
The following assumes that all three of your partial date fields are SAS dates. If they aren't, they would first have to be converted into SAS dates. I initially inputted them as _Medicine_date, _Treatment_A_Date and _Treatment_B_Date (all character fields) and changed them with the following code:
if not missing(_Medicine_date) then Medicine_date=input(catt(_Medicine_date,'-2015'),anydtdte11.); if not missing(_Treatment_A_Date) then Treatment_A_Date=input(catt(_Treatment_A_Date,'-2015'),anydtdte11.); if not missing(_Treatment_B_Date) then Treatment_B_Date=input(catt(_Treatment_B_Date,'-2015'),anydtdte11.);
Then I used the following code to create the new columns:
data want (drop=phase); set have; by patient_id; retain phase Medine_days; if first.patient_id then do; phase=1; call missing(Medine_days); call missing(Treatment_A_Days); call missing(Treatment_B_Days); end; if phase eq 1 then do; if not missing(Treatment_A_Date) then do; Treatment_A_Days=0; phase=2; call missing(Medine_days); end; else Medine_days=medicine_date-LS_DT; end; else if phase eq 2 then do; if not missing(Treatment_B_Date) then do; phase=3; treatment_B_days=0; call missing(Treatment_A_Days); end; else if not missing(Treatment_A_Date) then do; Treatment_A_Days=0; end; else Treatment_A_Days+1; end; else if phase eq 3 then do; if not missing(Treatment_B_Date) then do; treatment_B_days=0; end; else Treatment_B_Days+1; end; run;
Art, CEO, AnalystFinder.com
My answer: do not work with data like this. Dates without years are utterly useless and will bite you in your behind some day. Return to sender with a corresponding nastygram. You're being paid to analyze, not to make guesses.
data have;
infile cards expandtabs truncover;
input PatientID Medicinedate $ LS_DT : mmddyy10. TreatmentA $ treatmentB $;
date=input(cats(Medicinedate,'-2015'),date11.);
format date ls_dt mmddyy10.;
cards;
1 18-Aug 07/03/2015 . . 46
1 19-Aug 07/03/2015 . . 47
1 20-Aug 07/03/2015 . . 48
1 21-Aug 07/03/2015 . . 49
1 22-Aug 07/03/2015 . . 50
1 23-Aug 07/03/2015 . . 51
1 24-Aug 07/03/2015 24-Aug . 0
1 25-Aug 07/03/2015 . . 1
1 26-Aug 07/03/2015 . . 2
1 27-Aug 07/03/2015 . . 3
1 28-Aug 07/03/2015 . . 4
1 29-Aug 07/03/2015 . . 5
1 30-Aug 07/03/2015 30-Aug . 0
1 31-Aug 07/03/2015 . . 1
1 1-Sep 07/03/2015 . . 2
1 2-Sep 07/03/2015 . 2-Sep 0
1 3-Sep 07/03/2015 . . 1
1 4-Sep 07/03/2015 . . 2
2 31-Aug 07/05/2015 31-Aug . 57 0
2 1-Sep 07/05/2015 . . 1
2 2-Sep 07/05/2015 . . 2
2 3-Sep 07/05/2015 . 3-Sep 0
2 4-Sep 07/05/2015 . .
;
run;
data want;
set have;
by PatientID;
retain a b count;
if first.PatientID then do;
call missing(a,b,count);
end;
if not a and not b then Medinedays=date-ls_dt;
if not missing(TreatmentA) then do;a=1;count=-1;end;
if not missing(treatmentB) then do;b=1;count=-1;end;
count+1;
if a and not b then treatmentAdays=count;
if a and b then treatmentBdays=count;
drop a b count date;
run;
proc print noobs;run;
Thanks to all of you for responding my question. This is my first time to post a question. Sorry for some formating issue. The medicinedate field do have the year 2015, but it did not show it when I copied the date from Excel. Thanks for adding it.
Could you please help me to understand why count resets when the 2nd Treatment A starts on August 30th? if I only count the days for the 1st Treatment A starts, ignoring the 2nd one, how should I do? Thanks.
The code is doing what you showed/explained in your example. Are you now asking for subsequent starts to have missing values OR that the numbering continues from the first date?
Art, CEO, AnalystFinder.com
Whenever I see a "date" like this: 24-Aug
I say, go back to Excel and reset the column property to a proper date format then reexport or reread into SAS.
Note that I am assuming the data is entered in Excel but I have seen this many times. And the obnoxious thing you have to worry about with Excel is that it is known to change entered values to 'dates' that were not or to interpret the value as entered to be a different date than intended.
If any of your LS_DT are near the year boundary you may find that the Excel date for the Medicine or Treatment dates may be in the wrong year as those 24-Aug usually are assuming what ever was entered was for the year of data entry. So and actual Dec of 2015 might have changed to a Dec 2016 if entered in 2016.
Great suggestion! I went to my excel and modified the format for the dates.
Here is the modifed table.
Could you please help me to understand why count resets when the 2nd Treatment A starts on August 30th? if I only count the days for the 1st Treatment A starts, ignoring the 2nd one, as the treatment A days like this highlighted in Red. how should I do? Thanks.
patient ID | Medicine date | LS_DT | Treatment A Date | Treatment B Date | Medine days | treatment A days | treatment B days |
1 | 8/18/2015 | 07/03/2015 | 46 | ||||
1 | 8/19/2015 | 07/03/2015 | 47 | ||||
1 | 8/20/2015 | 07/03/2015 | 48 | ||||
1 | 8/21/2015 | 07/03/2015 | 49 | ||||
1 | 8/22/2015 | 07/03/2015 | 50 | ||||
1 | 8/23/2015 | 07/03/2015 | 51 | ||||
1 | 8/24/2015 | 07/03/2015 | 8/24/2015 | 0 | |||
1 | 8/25/2015 | 07/03/2015 | 1 | ||||
1 | 8/26/2015 | 07/03/2015 | 2 | ||||
1 | 8/27/2015 | 07/03/2015 | 3 | ||||
1 | 8/28/2015 | 07/03/2015 | 4 | ||||
1 | 8/29/2015 | 07/03/2015 | 5 | ||||
1 | 8/30/2015 | 07/03/2015 | 8/30/2015 | 6 | |||
1 | 8/31/2015 | 07/03/2015 | 7 | ||||
1 | 9/1/2015 | 07/03/2015 | 8 | ||||
1 | 9/2/2015 | 07/03/2015 | 9/2/2015 | 0 | |||
1 | 9/3/2015 | 07/03/2015 | 1 | ||||
1 | 9/4/2015 | 07/03/2015 | 2 | ||||
2 | 8/31/2015 | 07/05/2015 | 8/31/2015 | 0 | |||
2 | 9/1/2015 | 07/05/2015 | 1 | ||||
2 | 9/2/2015 | 07/05/2015 | 2 | ||||
2 | 9/3/2015 | 07/05/2015 | 9/3/2015 | 0 | |||
2 | 9/4/2015 | 07/05/2015 | 1 |
Because I reset it at here
if not missing(TreatmentA) then do;a=1;count=-1;end;
if not missing(treatmentB) then do;b=1;count=-1;end;
If you don't want reset it ,try this one.
data have;
infile cards expandtabs truncover;
input PatientID Medicinedate $ LS_DT : mmddyy10. TreatmentA $ treatmentB $;
date=input(cats(Medicinedate,'-2015'),date11.);
format date ls_dt mmddyy10.;
cards;
1 18-Aug 07/03/2015 . . 46
1 19-Aug 07/03/2015 . . 47
1 20-Aug 07/03/2015 . . 48
1 21-Aug 07/03/2015 . . 49
1 22-Aug 07/03/2015 . . 50
1 23-Aug 07/03/2015 . . 51
1 24-Aug 07/03/2015 24-Aug . 0
1 25-Aug 07/03/2015 . . 1
1 26-Aug 07/03/2015 . . 2
1 27-Aug 07/03/2015 . . 3
1 28-Aug 07/03/2015 . . 4
1 29-Aug 07/03/2015 . . 5
1 30-Aug 07/03/2015 30-Aug . 0
1 31-Aug 07/03/2015 . . 1
1 1-Sep 07/03/2015 . . 2
1 2-Sep 07/03/2015 . 2-Sep 0
1 3-Sep 07/03/2015 . . 1
1 4-Sep 07/03/2015 . . 2
2 31-Aug 07/05/2015 31-Aug . 57 0
2 1-Sep 07/05/2015 . . 1
2 2-Sep 07/05/2015 . . 2
2 3-Sep 07/05/2015 . 3-Sep 0
2 4-Sep 07/05/2015 . .
;
run;
data want;
set have;
by PatientID;
retain a b count;
if first.PatientID then do;
call missing(a,b,count);
end;
if not a and not b then Medinedays=date-ls_dt;
if not missing(TreatmentA) and not a then do;a=1;count=-1;end;
if not missing(treatmentB) and not b then do;b=1;count=-1;end;
count+1;
if a and not b then treatmentAdays=count;
if a and b then treatmentBdays=count;
drop a b count date;
run;
There are two changes in your latest example: (1) the 57 days removed in the line:
2 31-Aug07/05/2015 31-Aug08/31/2015 . 0 .
and (2) the counter isn't reset when an additional treatment date exists.
Interestingly, the code I had proposed, contrary to your specs, did precisely the first change. If the treatment and meds started on the same day, it didn't output the number of med days.
I don't have time to review @Ksharp's code but, for the code I proposed, here is the change that would do #2:
data want (drop=phase); set have; by patientid; retain phase Medine_days; if first.patientid then do; phase=1; call missing(Medine_days); call missing(Treatment_A_Days); call missing(Treatment_B_Days); end; if phase eq 1 then do; if not missing(TreatmentA) then do; Treatment_A_Days=0; phase=2; call missing(Medine_days); end; else Medine_days=date-LS_DT; end; else if phase eq 2 then do; if not missing(TreatmentB) then do; phase=3; treatment_B_days=0; call missing(Treatment_A_Days); end; /* else if not missing(TreatmentA) then do; */ /* Treatment_A_Days=0; */ /* end; */ else Treatment_A_Days+1; end; else if phase eq 3 then do; /* if not missing(TreatmentB) then do; */ /* treatment_B_days=0; */ /* end; */ /* else */ Treatment_B_Days+1; end; run;
Art, CEO, AnalystFinder.com
Re: data manipulation in SAS
Thank you so much for all of your help.
Both of your code (art297's code and Ksharp's code) work perfectly for the sample data I gave before, but my data is more complicated than that. For example, Treatment A and Treatment B can happen in the same day; Also the date is not necessary continuous, so it might miss some dates in between within the same patient. So I am thinking to use the lag function to repeat the treatment date, then use the following formula to calculate.
treatment A days = Medicine date - Treatment A date2
treatment B days = Medicine date - Treatment B date2
Would you please help me to get the Treatment A Date2 and Treatment B Date2? Thanks a lot!
patient ID | Medicine date | LS_DT | Treatment A Date | Treatment B Date | Treatment A Date2 | Treatment B Date2 | Medine days | treatment A days = Medicine date - Treatment A date2 | treatment B days = Medicine date - Treatment B date2 |
1 | 8/18/2015 | 07/03/2015 | 46 | ||||||
1 | 8/19/2015 | 07/03/2015 | 47 | ||||||
1 | 8/20/2015 | 07/03/2015 | 48 | ||||||
1 | 8/21/2015 | 07/03/2015 | 49 | ||||||
1 | 8/22/2015 | 07/03/2015 | 50 | ||||||
1 | 8/23/2015 | 07/03/2015 | 51 | ||||||
1 | 8/24/2015 | 07/03/2015 | 8/24/2015 | 8/24/2015 | 0 | ||||
1 | 8/25/2015 | 07/03/2015 | 8/24/2015 | 1 | |||||
1 | 8/26/2015 | 07/03/2015 | 8/24/2015 | 2 | |||||
1 | 8/27/2015 | 07/03/2015 | 8/24/2015 | 3 | |||||
1 | 8/28/2015 | 07/03/2015 | 8/24/2015 | 4 | |||||
1 | 8/29/2015 | 07/03/2015 | 8/24/2015 | 5 | |||||
1 | 8/30/2015 | 07/03/2015 | 8/30/2015 | 8/30/2015 | 8/30/2015 | 8/30/2015 | 0 | ||
1 | 8/31/2015 | 07/03/2015 | 8/30/2015 | 8/30/2015 | 1 | ||||
1 | 9/1/2015 | 07/03/2015 | 8/30/2015 | 8/30/2015 | 2 | ||||
1 | 9/2/2015 | 07/03/2015 | 8/30/2015 | 8/30/2015 | 3 | 0 | |||
1 | 9/3/2015 | 07/03/2015 | 8/30/2015 | 8/30/2015 | 4 | 1 | |||
1 | 9/4/2015 | 07/03/2015 | 8/30/2015 | 8/30/2015 | 5 | 2 | |||
2 | 8/31/2015 | 07/05/2015 | 8/31/2015 | 8/31/2015 | 0 | ||||
2 | 9/1/2015 | 07/05/2015 | 8/31/2015 | 1 | |||||
2 | 9/2/2015 | 07/05/2015 | 8/31/2015 | 2 | |||||
2 | 9/3/2015 | 07/05/2015 | 9/3/2015 | 8/31/2015 | 9/3/2015 | 0 | |||
2 | 9/4/2015 | 07/05/2015 | 8/31/2015 | 9/3/2015 | 1 |
data have;
infile cards expandtabs truncover;
input PatientID Medicinedate $ LS_DT : mmddyy10. TreatmentA : date9. treatmentB :date9.;
format date ls_dt TreatmentA treatmentB mmddyy10.;
cards;
1 18-Aug 07/03/2015 . . 46
1 19-Aug 07/03/2015 . . 47
1 20-Aug 07/03/2015 . . 48
1 21-Aug 07/03/2015 . . 49
1 22-Aug 07/03/2015 . . 50
1 23-Aug 07/03/2015 . . 51
1 24-Aug 07/03/2015 24Aug2015 . 0
1 25-Aug 07/03/2015 . . 1
1 26-Aug 07/03/2015 . . 2
1 27-Aug 07/03/2015 . . 3
1 28-Aug 07/03/2015 . . 4
1 29-Aug 07/03/2015 . . 5
1 30-Aug 07/03/2015 30Aug2015 . 0
1 31-Aug 07/03/2015 . . 1
1 1-Sep 07/03/2015 . . 2
1 2-Sep 07/03/2015 . 2Sep2015 0
1 3-Sep 07/03/2015 . . 1
1 4-Sep 07/03/2015 . . 2
2 31-Aug 07/05/2015 31Aug2015 . 57 0
2 1-Sep 07/05/2015 . . 1
2 2-Sep 07/05/2015 . . 2
2 3-Sep 07/05/2015 . 3Sep2015 0
2 4-Sep 07/05/2015 . .
;
run;
data want;
set have;
by PatientID;
retain TreatmentADate2 TreatmentBDate2;
if first.PatientID then call missing(TreatmentADate2,TreatmentBDate2);
if not missing(TreatmentA) then TreatmentADate2=TreatmentA ;
if not missing(treatmentB) then TreatmentBDate2=treatmentB ;
format TreatmentADate2 TreatmentBDate2 date9.;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.