## data manipulation in SAS

Solved
Occasional Contributor
Posts: 17

# data manipulation in SAS

how to create the last 3 columns based on the first 5 columns information?

1. Medicine days = medicine date - LS_DT, set to missing once the treatment A begins.

2. Treatment A days = Medicine date - Treatment A start Date untill the 2nd treatment A starts. For example,

There are 2 Treatment A. The first one starts on August 24th, so Treatment A days is starting from 0, until 5. The 2nd treatment  A
starts on August 30th, so the treatment A days resets and starts from 0 again until 2.

3. Treatment B days= Medicine date - Treatment B start Date untill the end of medicine date. For example,

the Treatment B starts on Sep 2nd, so Treatment B days is starting from 0, until 2.

 Patient ID Medicine date LS_DT Treatment A Date Treatment B Date Medine days treatment A days treatment B days 1 18-Aug 07/03/2015 46 1 19-Aug 07/03/2015 47 1 20-Aug 07/03/2015 48 1 21-Aug 07/03/2015 49 1 22-Aug 07/03/2015 50 1 23-Aug 07/03/2015 51 1 24-Aug 07/03/2015 24-Aug 0 1 25-Aug 07/03/2015 1 1 26-Aug 07/03/2015 2 1 27-Aug 07/03/2015 3 1 28-Aug 07/03/2015 4 1 29-Aug 07/03/2015 5 1 30-Aug 07/03/2015 30-Aug 0 1 31-Aug 07/03/2015 1 1 1-Sep 07/03/2015 2 1 2-Sep 07/03/2015 2-Sep 0 1 3-Sep 07/03/2015 1 1 4-Sep 07/03/2015 2 2 31-Aug 07/05/2015 31-Aug 57 0 2 1-Sep 07/05/2015 1 2 2-Sep 07/05/2015 2 2 3-Sep 07/05/2015 3-Sep 0 2 4-Sep 07/05/2015 1

Accepted Solutions
Solution
‎05-08-2017 10:19 AM
Super User
Posts: 10,686

## Re: data manipulation in SAS

``````data have;
infile cards expandtabs truncover;
input PatientID	Medicinedate \$	LS_DT : mmddyy10.	TreatmentA \$ treatmentB \$;
date=input(cats(Medicinedate,'-2015'),date11.);
format date ls_dt mmddyy10.;
cards;
1	18-Aug	07/03/2015	. . 	 	46
1	19-Aug	07/03/2015	. . 	 	47
1	20-Aug	07/03/2015	 . .	 	48
1	21-Aug	07/03/2015	 . .	 	49
1	22-Aug	07/03/2015	 . .	 	50
1	23-Aug	07/03/2015	 . .	 	51
1	24-Aug	07/03/2015	24-Aug	 .	 	0
1	25-Aug	07/03/2015	 . .	 	 	1
1	26-Aug	07/03/2015	 . .	 	 	2
1	27-Aug	07/03/2015	 . .	 	 	3
1	28-Aug	07/03/2015	 . .	 	 	4
1	29-Aug	07/03/2015	 . .	 	 	5
1	30-Aug	07/03/2015	30-Aug	. 	 	0
1	31-Aug	07/03/2015	 . .	 	 	1
1	1-Sep	07/03/2015	 . .	 	 	2
1	2-Sep	07/03/2015	 .	2-Sep	 	 	0
1	3-Sep	07/03/2015	 . .	 	 	 	1
1	4-Sep	07/03/2015	 . .	 	 	 	2
2	31-Aug	07/05/2015	31-Aug	. 	57	0
2	1-Sep	07/05/2015	 . .	 	 	1
2	2-Sep	07/05/2015	 . .	 	 	2
2	3-Sep	07/05/2015	 .	3-Sep	 	 	0
2	4-Sep	07/05/2015	 . .
;
run;
data want;
set have;
by PatientID;
retain a b count;
if first.PatientID then do;
call missing(a,b,count);
end;

if not a and not b then Medinedays=date-ls_dt;

if not missing(TreatmentA) then do;a=1;count=-1;end;
if not missing(treatmentB) then do;b=1;count=-1;end;
count+1;
if a and not b then treatmentAdays=count;
if a and b then treatmentBdays=count;
drop a b count date;
run;
proc print noobs;run;``````

All Replies
Super User
Posts: 23,253

## Re: data manipulation in SAS

How did you get those first 5 columns? I suspect the data was originally not like that, but it's not conducive to answering your question easily. It would be much easier if the treatment A start date was at the top for example.

What format are your dates in? The Medicine date doesn't have a year? Is that correct? What happens when a treatment A date is one year and the LST date is the previous year? How can that be differentiated?

Do you really need three new columns? Why not create one new column with a treatment type and a second with duration? It'll likely be easier to work with further on in this format.

PROC Star
Posts: 8,146

## Re: data manipulation in SAS

The following assumes that all three of your partial date fields are SAS dates. If they aren't, they would first have to be converted into SAS dates. I initially inputted them as _Medicine_date, _Treatment_A_Date and _Treatment_B_Date (all character fields) and changed them with the following code:

```  if not missing(_Medicine_date) then Medicine_date=input(catt(_Medicine_date,'-2015'),anydtdte11.);
if not missing(_Treatment_A_Date) then Treatment_A_Date=input(catt(_Treatment_A_Date,'-2015'),anydtdte11.);
if not missing(_Treatment_B_Date) then Treatment_B_Date=input(catt(_Treatment_B_Date,'-2015'),anydtdte11.);
```

Then I used the following code to create the new columns:

```data want (drop=phase);
set have;
by patient_id;
retain phase Medine_days;
if first.patient_id then do;
phase=1;
call missing(Medine_days);
call missing(Treatment_A_Days);
call missing(Treatment_B_Days);
end;
if phase eq 1 then do;
if not missing(Treatment_A_Date) then do;
Treatment_A_Days=0;
phase=2;
call missing(Medine_days);
end;
else Medine_days=medicine_date-LS_DT;
end;
else if phase eq 2 then do;
if not missing(Treatment_B_Date) then do;
phase=3;
treatment_B_days=0;
call missing(Treatment_A_Days);
end;
else if not missing(Treatment_A_Date) then do;
Treatment_A_Days=0;
end;
else Treatment_A_Days+1;
end;
else if phase eq 3 then do;
if not missing(Treatment_B_Date) then do;
treatment_B_days=0;
end;
else Treatment_B_Days+1;
end;
run;
```

Art, CEO, AnalystFinder.com

Super User
Posts: 9,880

## Re: data manipulation in SAS

My answer: do not work with data like this. Dates without years are utterly useless and will bite you in your behind some day. Return to sender with a corresponding nastygram. You're being paid to analyze, not to make guesses.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Solution
‎05-08-2017 10:19 AM
Super User
Posts: 10,686

## Re: data manipulation in SAS

``````data have;
infile cards expandtabs truncover;
input PatientID	Medicinedate \$	LS_DT : mmddyy10.	TreatmentA \$ treatmentB \$;
date=input(cats(Medicinedate,'-2015'),date11.);
format date ls_dt mmddyy10.;
cards;
1	18-Aug	07/03/2015	. . 	 	46
1	19-Aug	07/03/2015	. . 	 	47
1	20-Aug	07/03/2015	 . .	 	48
1	21-Aug	07/03/2015	 . .	 	49
1	22-Aug	07/03/2015	 . .	 	50
1	23-Aug	07/03/2015	 . .	 	51
1	24-Aug	07/03/2015	24-Aug	 .	 	0
1	25-Aug	07/03/2015	 . .	 	 	1
1	26-Aug	07/03/2015	 . .	 	 	2
1	27-Aug	07/03/2015	 . .	 	 	3
1	28-Aug	07/03/2015	 . .	 	 	4
1	29-Aug	07/03/2015	 . .	 	 	5
1	30-Aug	07/03/2015	30-Aug	. 	 	0
1	31-Aug	07/03/2015	 . .	 	 	1
1	1-Sep	07/03/2015	 . .	 	 	2
1	2-Sep	07/03/2015	 .	2-Sep	 	 	0
1	3-Sep	07/03/2015	 . .	 	 	 	1
1	4-Sep	07/03/2015	 . .	 	 	 	2
2	31-Aug	07/05/2015	31-Aug	. 	57	0
2	1-Sep	07/05/2015	 . .	 	 	1
2	2-Sep	07/05/2015	 . .	 	 	2
2	3-Sep	07/05/2015	 .	3-Sep	 	 	0
2	4-Sep	07/05/2015	 . .
;
run;
data want;
set have;
by PatientID;
retain a b count;
if first.PatientID then do;
call missing(a,b,count);
end;

if not a and not b then Medinedays=date-ls_dt;

if not missing(TreatmentA) then do;a=1;count=-1;end;
if not missing(treatmentB) then do;b=1;count=-1;end;
count+1;
if a and not b then treatmentAdays=count;
if a and b then treatmentBdays=count;
drop a b count date;
run;
proc print noobs;run;``````
Occasional Contributor
Posts: 17

## Re: data manipulation in SAS

Thanks to all of you for responding my question. This is my first time to post a question. Sorry for some formating issue. The medicinedate field do have the year 2015, but it did not show it when I copied the date from Excel. Thanks for adding it.

Occasional Contributor
Posts: 17

## Re: data manipulation in SAS

Could you please help me to understand why count resets when the 2nd Treatment A starts on August 30th? if I only count the days for the 1st Treatment A starts, ignoring the 2nd one, how should I do? Thanks.

PROC Star
Posts: 8,146

## Re: data manipulation in SAS

The code is doing what you showed/explained in your example. Are you now asking for subsequent starts to have missing values OR that the numbering continues from the first date?

Art, CEO, AnalystFinder.com

Super User
Posts: 13,301

## Re: data manipulation in SAS

Whenever I see a "date" like this: 24-Aug

I say, go back to Excel and reset the column property to a proper date format then reexport or reread into SAS.

Note that I am assuming the data is entered in Excel but I have seen this many times. And the obnoxious thing you have to worry about with Excel is that it is known to change entered values to 'dates' that were not or to interpret the value as entered to be a different date than intended.

If any of your LS_DT are near the year boundary you may find that the Excel date for the Medicine or Treatment dates may be in the wrong year as those 24-Aug usually are assuming what ever was entered was for the year of data entry. So and actual Dec of 2015 might have changed to a Dec 2016 if entered in 2016.

Occasional Contributor
Posts: 17

## Re: data manipulation in SAS

Great suggestion! I went to my excel and modified the format for the dates.

Here is the modifed table.

Could you please help me to understand why count resets when the 2nd Treatment A starts on August 30th? if I only count the days for the 1st Treatment A starts, ignoring the 2nd one, as the treatment A days like this highlighted in Red. how should I do? Thanks.

 patient ID Medicine date LS_DT Treatment A Date Treatment B Date Medine days treatment A days treatment B days 1 8/18/2015 07/03/2015 46 1 8/19/2015 07/03/2015 47 1 8/20/2015 07/03/2015 48 1 8/21/2015 07/03/2015 49 1 8/22/2015 07/03/2015 50 1 8/23/2015 07/03/2015 51 1 8/24/2015 07/03/2015 8/24/2015 0 1 8/25/2015 07/03/2015 1 1 8/26/2015 07/03/2015 2 1 8/27/2015 07/03/2015 3 1 8/28/2015 07/03/2015 4 1 8/29/2015 07/03/2015 5 1 8/30/2015 07/03/2015 8/30/2015 6 1 8/31/2015 07/03/2015 7 1 9/1/2015 07/03/2015 8 1 9/2/2015 07/03/2015 9/2/2015 0 1 9/3/2015 07/03/2015 1 1 9/4/2015 07/03/2015 2 2 8/31/2015 07/05/2015 8/31/2015 0 2 9/1/2015 07/05/2015 1 2 9/2/2015 07/05/2015 2 2 9/3/2015 07/05/2015 9/3/2015 0 2 9/4/2015 07/05/2015 1
Super User
Posts: 10,686

## Re: data manipulation in SAS

Because I reset it at here

`````` if not missing(TreatmentA) then do;a=1;count=-1;end;
if not missing(treatmentB) then do;b=1;count=-1;end;``````

If you don't want reset it ,try this one.

``````data have;
infile cards expandtabs truncover;
input PatientID	Medicinedate \$	LS_DT : mmddyy10.	TreatmentA \$ treatmentB \$;
date=input(cats(Medicinedate,'-2015'),date11.);
format date ls_dt mmddyy10.;
cards;
1	18-Aug	07/03/2015	. . 	 	46
1	19-Aug	07/03/2015	. . 	 	47
1	20-Aug	07/03/2015	 . .	 	48
1	21-Aug	07/03/2015	 . .	 	49
1	22-Aug	07/03/2015	 . .	 	50
1	23-Aug	07/03/2015	 . .	 	51
1	24-Aug	07/03/2015	24-Aug	 .	 	0
1	25-Aug	07/03/2015	 . .	 	 	1
1	26-Aug	07/03/2015	 . .	 	 	2
1	27-Aug	07/03/2015	 . .	 	 	3
1	28-Aug	07/03/2015	 . .	 	 	4
1	29-Aug	07/03/2015	 . .	 	 	5
1	30-Aug	07/03/2015	30-Aug	. 	 	0
1	31-Aug	07/03/2015	 . .	 	 	1
1	1-Sep	07/03/2015	 . .	 	 	2
1	2-Sep	07/03/2015	 .	2-Sep	 	 	0
1	3-Sep	07/03/2015	 . .	 	 	 	1
1	4-Sep	07/03/2015	 . .	 	 	 	2
2	31-Aug	07/05/2015	31-Aug	. 	57	0
2	1-Sep	07/05/2015	 . .	 	 	1
2	2-Sep	07/05/2015	 . .	 	 	2
2	3-Sep	07/05/2015	 .	3-Sep	 	 	0
2	4-Sep	07/05/2015	 . .
;
run;
data want;
set have;
by PatientID;
retain a b count;
if first.PatientID then do;
call missing(a,b,count);
end;

if not a and not b then Medinedays=date-ls_dt;

if not missing(TreatmentA) and not a then do;a=1;count=-1;end;
if not missing(treatmentB) and not b then do;b=1;count=-1;end;
count+1;
if a and not b then treatmentAdays=count;
if a and b then treatmentBdays=count;
drop a b count date;
run;
``````
PROC Star
Posts: 8,146

## Re: data manipulation in SAS

There are two changes in your latest example: (1) the 57 days removed in the line:

`2 31-Aug07/05/2015 31-Aug08/31/2015 . 0 .`

and (2) the counter isn't reset when an additional treatment date exists.

Interestingly, the code I had proposed, contrary to your specs, did precisely the first change. If the treatment and meds started on the same day, it didn't output the number of med days.

I don't have time to review @Ksharp's code but, for the code I proposed, here is the change that would do #2:

```data want (drop=phase);
set have;
by patientid;
retain phase Medine_days;
if first.patientid then do;
phase=1;
call missing(Medine_days);
call missing(Treatment_A_Days);
call missing(Treatment_B_Days);
end;
if phase eq 1 then do;
if not missing(TreatmentA) then do;
Treatment_A_Days=0;
phase=2;
call missing(Medine_days);
end;
else Medine_days=date-LS_DT;
end;
else if phase eq 2 then do;
if not missing(TreatmentB) then do;
phase=3;
treatment_B_days=0;
call missing(Treatment_A_Days);
end;
/*     else if not missing(TreatmentA) then do; */
/*       Treatment_A_Days=0; */
/*     end; */
else Treatment_A_Days+1;
end;
else if phase eq 3 then do;
/*     if not missing(TreatmentB) then do; */
/*       treatment_B_days=0; */
/*     end; */
/*     else */
Treatment_B_Days+1;
end;
run;
```

Art, CEO, AnalystFinder.com

Occasional Contributor
Posts: 17

## Re: data manipulation in SAS

Re: data manipulation in SAS

Thank you so much for all of your help.

Both of your code ('s code and Ksharp's code) work perfectly for the sample data I gave before, but my data is more complicated than that. For example, Treatment A and Treatment B can happen in the same day; Also the date is not necessary continuous, so it might miss some dates in between within the same patient. So I am thinking to use the lag function to repeat the treatment date, then use the following formula to calculate.

treatment A days = Medicine date - Treatment A date2

treatment B days = Medicine date - Treatment B date2

Would you please help me to get the Treatment A Date2 and Treatment B Date2? Thanks a lot!

 patient ID Medicine date LS_DT Treatment A Date Treatment B Date Treatment A Date2 Treatment B Date2 Medine days treatment A days = Medicine date - Treatment A date2 treatment B days = Medicine date - Treatment B date2 1 8/18/2015 07/03/2015 46 1 8/19/2015 07/03/2015 47 1 8/20/2015 07/03/2015 48 1 8/21/2015 07/03/2015 49 1 8/22/2015 07/03/2015 50 1 8/23/2015 07/03/2015 51 1 8/24/2015 07/03/2015 8/24/2015 8/24/2015 0 1 8/25/2015 07/03/2015 8/24/2015 1 1 8/26/2015 07/03/2015 8/24/2015 2 1 8/27/2015 07/03/2015 8/24/2015 3 1 8/28/2015 07/03/2015 8/24/2015 4 1 8/29/2015 07/03/2015 8/24/2015 5 1 8/30/2015 07/03/2015 8/30/2015 8/30/2015 8/30/2015 8/30/2015 0 1 8/31/2015 07/03/2015 8/30/2015 8/30/2015 1 1 9/1/2015 07/03/2015 8/30/2015 8/30/2015 2 1 9/2/2015 07/03/2015 8/30/2015 8/30/2015 3 0 1 9/3/2015 07/03/2015 8/30/2015 8/30/2015 4 1 1 9/4/2015 07/03/2015 8/30/2015 8/30/2015 5 2 2 8/31/2015 07/05/2015 8/31/2015 8/31/2015 0 2 9/1/2015 07/05/2015 8/31/2015 1 2 9/2/2015 07/05/2015 8/31/2015 2 2 9/3/2015 07/05/2015 9/3/2015 8/31/2015 9/3/2015 0 2 9/4/2015 07/05/2015 8/31/2015 9/3/2015 1
Super User
Posts: 10,686

## Re: data manipulation in SAS

``````data have;
infile cards expandtabs truncover;
input PatientID	Medicinedate \$	LS_DT : mmddyy10.	TreatmentA : date9. treatmentB :date9.;
format date ls_dt TreatmentA  treatmentB mmddyy10.;
cards;
1	18-Aug	07/03/2015	. . 	 	46
1	19-Aug	07/03/2015	. . 	 	47
1	20-Aug	07/03/2015	 . .	 	48
1	21-Aug	07/03/2015	 . .	 	49
1	22-Aug	07/03/2015	 . .	 	50
1	23-Aug	07/03/2015	 . .	 	51
1	24-Aug	07/03/2015	24Aug2015	 .	 	0
1	25-Aug	07/03/2015	 . .	 	 	1
1	26-Aug	07/03/2015	 . .	 	 	2
1	27-Aug	07/03/2015	 . .	 	 	3
1	28-Aug	07/03/2015	 . .	 	 	4
1	29-Aug	07/03/2015	 . .	 	 	5
1	30-Aug	07/03/2015	30Aug2015	. 	 	0
1	31-Aug	07/03/2015	 . .	 	 	1
1	1-Sep	07/03/2015	 . .	 	 	2
1	2-Sep	07/03/2015	 .	2Sep2015	 	 	0
1	3-Sep	07/03/2015	 . .	 	 	 	1
1	4-Sep	07/03/2015	 . .	 	 	 	2
2	31-Aug	07/05/2015	31Aug2015	. 	57	0
2	1-Sep	07/05/2015	 . .	 	 	1
2	2-Sep	07/05/2015	 . .	 	 	2
2	3-Sep	07/05/2015	 .	3Sep2015	 	 	0
2	4-Sep	07/05/2015	 . .
;
run;
data want;
set have;
by PatientID;
if not missing(TreatmentA) then TreatmentADate2=TreatmentA ;
if not missing(treatmentB) then TreatmentBDate2=treatmentB ;
run;``````
Occasional Contributor
Posts: 17

## Re: data manipulation in SAS

Thanks a lot! Retain function is very useful! :-) SAS community is so helpful. Thanks to all!
☑ This topic is solved.