SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

data manipulation in SAS

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

data manipulation in SAS

how to create the last 3 columns based on the first 5 columns information?

1. Medicine days = medicine date - LS_DT, set to missing once the treatment A begins.

2. Treatment A days = Medicine date - Treatment A start Date untill the 2nd treatment A starts. For example,

   There are 2 Treatment A. The first one starts on August 24th, so Treatment A days is starting from 0, until 5. The 2nd treatment  A
   starts on August 30th, so the treatment A days resets and starts from 0 again until 2.

3. Treatment B days= Medicine date - Treatment B start Date untill the end of medicine date. For example,

   the Treatment B starts on Sep 2nd, so Treatment B days is starting from 0, until 2. 

Patient IDMedicine dateLS_DTTreatment A DateTreatment B DateMedine days treatment A daystreatment B days
118-Aug07/03/2015  46  
119-Aug07/03/2015  47  
120-Aug07/03/2015  48  
121-Aug07/03/2015  49  
122-Aug07/03/2015  50  
123-Aug07/03/2015  51  
124-Aug07/03/201524-Aug  0 
125-Aug07/03/2015   1 
126-Aug07/03/2015   2 
127-Aug07/03/2015   3 
128-Aug07/03/2015   4 
129-Aug07/03/2015   5 
130-Aug07/03/201530-Aug  0 
131-Aug07/03/2015   1 
11-Sep07/03/2015   2 
12-Sep07/03/2015 2-Sep  0
13-Sep07/03/2015    1
14-Sep07/03/2015    2
231-Aug07/05/201531-Aug 570 
21-Sep07/05/2015   1 
22-Sep07/05/2015   2 
23-Sep07/05/2015 3-Sep  0
24-Sep07/05/2015    1

Accepted Solutions
Solution
‎05-08-2017 10:19 AM
Super User
Posts: 9,681

Re: data manipulation in SAS

data have;
infile cards expandtabs truncover;
input PatientID	Medicinedate $	LS_DT : mmddyy10.	TreatmentA $ treatmentB $;
date=input(cats(Medicinedate,'-2015'),date11.);
format date ls_dt mmddyy10.;
cards;
1	18-Aug	07/03/2015	. . 	 	46	 	 
1	19-Aug	07/03/2015	. . 	 	47	 	 
1	20-Aug	07/03/2015	 . .	 	48	 	 
1	21-Aug	07/03/2015	 . .	 	49	 	 
1	22-Aug	07/03/2015	 . .	 	50	 	 
1	23-Aug	07/03/2015	 . .	 	51	 	 
1	24-Aug	07/03/2015	24-Aug	 .	 	0	 
1	25-Aug	07/03/2015	 . .	 	 	1	 
1	26-Aug	07/03/2015	 . .	 	 	2	 
1	27-Aug	07/03/2015	 . .	 	 	3	 
1	28-Aug	07/03/2015	 . .	 	 	4	 
1	29-Aug	07/03/2015	 . .	 	 	5	 
1	30-Aug	07/03/2015	30-Aug	. 	 	0	 
1	31-Aug	07/03/2015	 . .	 	 	1	 
1	1-Sep	07/03/2015	 . .	 	 	2	 
1	2-Sep	07/03/2015	 .	2-Sep	 	 	0
1	3-Sep	07/03/2015	 . .	 	 	 	1
1	4-Sep	07/03/2015	 . .	 	 	 	2
2	31-Aug	07/05/2015	31-Aug	. 	57	0	 
2	1-Sep	07/05/2015	 . .	 	 	1	 
2	2-Sep	07/05/2015	 . .	 	 	2	 
2	3-Sep	07/05/2015	 .	3-Sep	 	 	0
2	4-Sep	07/05/2015	 . .	 
;
run;
data want;
 set have;
 by PatientID;
 retain a b count;
 if first.PatientID then do;
  call missing(a,b,count);
 end;

 if not a and not b then Medinedays=date-ls_dt;
 
 if not missing(TreatmentA) then do;a=1;count=-1;end;
 if not missing(treatmentB) then do;b=1;count=-1;end;
 count+1;
 if a and not b then treatmentAdays=count;
 if a and b then treatmentBdays=count;
 drop a b count date;
run;
proc print noobs;run;

View solution in original post


All Replies
Super User
Posts: 17,832

Re: data manipulation in SAS

How did you get those first 5 columns? I suspect the data was originally not like that, but it's not conducive to answering your question easily. It would be much easier if the treatment A start date was at the top for example.

 

What format are your dates in? The Medicine date doesn't have a year? Is that correct? What happens when a treatment A date is one year and the LST date is the previous year? How can that be differentiated?

 

Do you really need three new columns? Why not create one new column with a treatment type and a second with duration? It'll likely be easier to work with further on in this format.

PROC Star
Posts: 7,363

Re: data manipulation in SAS

The following assumes that all three of your partial date fields are SAS dates. If they aren't, they would first have to be converted into SAS dates. I initially inputted them as _Medicine_date, _Treatment_A_Date and _Treatment_B_Date (all character fields) and changed them with the following code:

  if not missing(_Medicine_date) then Medicine_date=input(catt(_Medicine_date,'-2015'),anydtdte11.);
  if not missing(_Treatment_A_Date) then Treatment_A_Date=input(catt(_Treatment_A_Date,'-2015'),anydtdte11.);
  if not missing(_Treatment_B_Date) then Treatment_B_Date=input(catt(_Treatment_B_Date,'-2015'),anydtdte11.);

 Then I used the following code to create the new columns:

data want (drop=phase);
  set have;
  by patient_id;
  retain phase Medine_days;
  if first.patient_id then do;
    phase=1;
    call missing(Medine_days);
    call missing(Treatment_A_Days);
    call missing(Treatment_B_Days);
  end;
  if phase eq 1 then do;
    if not missing(Treatment_A_Date) then do;
      Treatment_A_Days=0;
      phase=2;
      call missing(Medine_days);
    end;
    else Medine_days=medicine_date-LS_DT;
  end;
  else if phase eq 2 then do;
    if not missing(Treatment_B_Date) then do;
      phase=3;
      treatment_B_days=0;
      call missing(Treatment_A_Days);
    end;
    else if not missing(Treatment_A_Date) then do;
      Treatment_A_Days=0;
    end;
    else Treatment_A_Days+1;
  end;
  else if phase eq 3 then do;
    if not missing(Treatment_B_Date) then do;
      treatment_B_days=0;
    end;
    else Treatment_B_Days+1;
  end;
run;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 6,938

Re: data manipulation in SAS

My answer: do not work with data like this. Dates without years are utterly useless and will bite you in your behind some day. Return to sender with a corresponding nastygram. You're being paid to analyze, not to make guesses.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎05-08-2017 10:19 AM
Super User
Posts: 9,681

Re: data manipulation in SAS

data have;
infile cards expandtabs truncover;
input PatientID	Medicinedate $	LS_DT : mmddyy10.	TreatmentA $ treatmentB $;
date=input(cats(Medicinedate,'-2015'),date11.);
format date ls_dt mmddyy10.;
cards;
1	18-Aug	07/03/2015	. . 	 	46	 	 
1	19-Aug	07/03/2015	. . 	 	47	 	 
1	20-Aug	07/03/2015	 . .	 	48	 	 
1	21-Aug	07/03/2015	 . .	 	49	 	 
1	22-Aug	07/03/2015	 . .	 	50	 	 
1	23-Aug	07/03/2015	 . .	 	51	 	 
1	24-Aug	07/03/2015	24-Aug	 .	 	0	 
1	25-Aug	07/03/2015	 . .	 	 	1	 
1	26-Aug	07/03/2015	 . .	 	 	2	 
1	27-Aug	07/03/2015	 . .	 	 	3	 
1	28-Aug	07/03/2015	 . .	 	 	4	 
1	29-Aug	07/03/2015	 . .	 	 	5	 
1	30-Aug	07/03/2015	30-Aug	. 	 	0	 
1	31-Aug	07/03/2015	 . .	 	 	1	 
1	1-Sep	07/03/2015	 . .	 	 	2	 
1	2-Sep	07/03/2015	 .	2-Sep	 	 	0
1	3-Sep	07/03/2015	 . .	 	 	 	1
1	4-Sep	07/03/2015	 . .	 	 	 	2
2	31-Aug	07/05/2015	31-Aug	. 	57	0	 
2	1-Sep	07/05/2015	 . .	 	 	1	 
2	2-Sep	07/05/2015	 . .	 	 	2	 
2	3-Sep	07/05/2015	 .	3-Sep	 	 	0
2	4-Sep	07/05/2015	 . .	 
;
run;
data want;
 set have;
 by PatientID;
 retain a b count;
 if first.PatientID then do;
  call missing(a,b,count);
 end;

 if not a and not b then Medinedays=date-ls_dt;
 
 if not missing(TreatmentA) then do;a=1;count=-1;end;
 if not missing(treatmentB) then do;b=1;count=-1;end;
 count+1;
 if a and not b then treatmentAdays=count;
 if a and b then treatmentBdays=count;
 drop a b count date;
run;
proc print noobs;run;
Occasional Contributor
Posts: 17

Re: data manipulation in SAS

Thanks to all of you for responding my question. This is my first time to post a question. Sorry for some formating issue. The medicinedate field do have the year 2015, but it did not show it when I copied the date from Excel. Thanks for adding it. 

Occasional Contributor
Posts: 17

Re: data manipulation in SAS

Could you please help me to understand why count resets when the 2nd Treatment A starts on August 30th? if I only count the days for the 1st Treatment A starts, ignoring the 2nd one, how should I do? Thanks. 

PROC Star
Posts: 7,363

Re: data manipulation in SAS

The code is doing what you showed/explained in your example. Are you now asking for subsequent starts to have missing values OR that the numbering continues from the first date?

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 10,500

Re: data manipulation in SAS

Whenever I see a "date" like this: 24-Aug

I say, go back to Excel and reset the column property to a proper date format then reexport or reread into SAS.

Note that I am assuming the data is entered in Excel but I have seen this many times. And the obnoxious thing you have to worry about with Excel is that it is known to change entered values to 'dates' that were not or to interpret the value as entered to be a different date than intended.

 

If any of your LS_DT are near the year boundary you may find that the Excel date for the Medicine or Treatment dates may be in the wrong year as those 24-Aug usually are assuming what ever was entered was for the year of data entry. So and actual Dec of 2015 might have changed to a Dec 2016 if entered in 2016.

Occasional Contributor
Posts: 17

Re: data manipulation in SAS

Great suggestion! I went to my excel and modified the format for the dates. 

Here is the modifed table. 

Could you please help me to understand why count resets when the 2nd Treatment A starts on August 30th? if I only count the days for the 1st Treatment A starts, ignoring the 2nd one, as the treatment A days like this highlighted in Red. how should I do? Thanks. 

patient IDMedicine dateLS_DTTreatment A DateTreatment B DateMedine days treatment A daystreatment B days
18/18/201507/03/2015  46  
18/19/201507/03/2015  47  
18/20/201507/03/2015  48  
18/21/201507/03/2015  49  
18/22/201507/03/2015  50  
18/23/201507/03/2015  51  
18/24/201507/03/20158/24/2015  0 
18/25/201507/03/2015   1 
18/26/201507/03/2015   2 
18/27/201507/03/2015   3 
18/28/201507/03/2015   4 
18/29/201507/03/2015   5 
18/30/201507/03/20158/30/2015  6 
18/31/201507/03/2015   7 
19/1/201507/03/2015   8 
19/2/201507/03/2015 9/2/2015  0
19/3/201507/03/2015    1
19/4/201507/03/2015    2
28/31/201507/05/20158/31/2015  0 
29/1/201507/05/2015   1 
29/2/201507/05/2015   2 
29/3/201507/05/2015 9/3/2015  0
29/4/201507/05/2015    1
Super User
Posts: 9,681

Re: data manipulation in SAS

Because I reset it at here

 if not missing(TreatmentA) then do;a=1;count=-1;end;
 if not missing(treatmentB) then do;b=1;count=-1;end;

If you don't want reset it ,try this one.

 

data have;
infile cards expandtabs truncover;
input PatientID	Medicinedate $	LS_DT : mmddyy10.	TreatmentA $ treatmentB $;
date=input(cats(Medicinedate,'-2015'),date11.);
format date ls_dt mmddyy10.;
cards;
1	18-Aug	07/03/2015	. . 	 	46	 	 
1	19-Aug	07/03/2015	. . 	 	47	 	 
1	20-Aug	07/03/2015	 . .	 	48	 	 
1	21-Aug	07/03/2015	 . .	 	49	 	 
1	22-Aug	07/03/2015	 . .	 	50	 	 
1	23-Aug	07/03/2015	 . .	 	51	 	 
1	24-Aug	07/03/2015	24-Aug	 .	 	0	 
1	25-Aug	07/03/2015	 . .	 	 	1	 
1	26-Aug	07/03/2015	 . .	 	 	2	 
1	27-Aug	07/03/2015	 . .	 	 	3	 
1	28-Aug	07/03/2015	 . .	 	 	4	 
1	29-Aug	07/03/2015	 . .	 	 	5	 
1	30-Aug	07/03/2015	30-Aug	. 	 	0	 
1	31-Aug	07/03/2015	 . .	 	 	1	 
1	1-Sep	07/03/2015	 . .	 	 	2	 
1	2-Sep	07/03/2015	 .	2-Sep	 	 	0
1	3-Sep	07/03/2015	 . .	 	 	 	1
1	4-Sep	07/03/2015	 . .	 	 	 	2
2	31-Aug	07/05/2015	31-Aug	. 	57	0	 
2	1-Sep	07/05/2015	 . .	 	 	1	 
2	2-Sep	07/05/2015	 . .	 	 	2	 
2	3-Sep	07/05/2015	 .	3-Sep	 	 	0
2	4-Sep	07/05/2015	 . .	 
;
run;
data want;
 set have;
 by PatientID;
 retain a b count;
 if first.PatientID then do;
  call missing(a,b,count);
 end;

 if not a and not b then Medinedays=date-ls_dt;
 
 if not missing(TreatmentA) and not a then do;a=1;count=-1;end;
 if not missing(treatmentB) and not b then do;b=1;count=-1;end;
 count+1;
 if a and not b then treatmentAdays=count;
 if a and b then treatmentBdays=count;
 drop a b count date;
run;
PROC Star
Posts: 7,363

Re: data manipulation in SAS

There are two changes in your latest example: (1) the 57 days removed in the line:

2 31-Aug07/05/2015 31-Aug08/31/2015 . 0 .

 

and (2) the counter isn't reset when an additional treatment date exists.

 

Interestingly, the code I had proposed, contrary to your specs, did precisely the first change. If the treatment and meds started on the same day, it didn't output the number of med days.

 

I don't have time to review @Ksharp's code but, for the code I proposed, here is the change that would do #2:

data want (drop=phase);
  set have;
  by patientid;
  retain phase Medine_days;
  if first.patientid then do;
    phase=1;
    call missing(Medine_days);
    call missing(Treatment_A_Days);
    call missing(Treatment_B_Days);
  end;
  if phase eq 1 then do;
    if not missing(TreatmentA) then do;
      Treatment_A_Days=0;
      phase=2;
      call missing(Medine_days);
    end;
    else Medine_days=date-LS_DT;
  end;
  else if phase eq 2 then do;
    if not missing(TreatmentB) then do;
      phase=3;
      treatment_B_days=0;
      call missing(Treatment_A_Days);
    end;
/*     else if not missing(TreatmentA) then do; */
/*       Treatment_A_Days=0; */
/*     end; */
    else Treatment_A_Days+1;
  end;
  else if phase eq 3 then do;
/*     if not missing(TreatmentB) then do; */
/*       treatment_B_days=0; */
/*     end; */
/*     else */
    Treatment_B_Days+1;
  end;
run;

 

Art, CEO, AnalystFinder.com

Occasional Contributor
Posts: 17

Re: data manipulation in SAS

Re: data manipulation in SAS

 

Thank you so much for all of your help. 

Both of your code ('s code and Ksharp's code) work perfectly for the sample data I gave before, but my data is more complicated than that. For example, Treatment A and Treatment B can happen in the same day; Also the date is not necessary continuous, so it might miss some dates in between within the same patient. So I am thinking to use the lag function to repeat the treatment date, then use the following formula to calculate. 

treatment A days = Medicine date - Treatment A date2

treatment B days = Medicine date - Treatment B date2

 

Would you please help me to get the Treatment A Date2 and Treatment B Date2? Thanks a lot!

 

patient IDMedicine dateLS_DTTreatment A DateTreatment B DateTreatment A Date2Treatment B Date2Medine days treatment A days = Medicine date - Treatment A date2treatment B days = Medicine date - Treatment B date2
18/18/201507/03/2015    46  
18/19/201507/03/2015    47  
18/20/201507/03/2015    48  
18/21/201507/03/2015    49  
18/22/201507/03/2015    50  
18/23/201507/03/2015    51  
18/24/201507/03/20158/24/2015 8/24/2015  0 
18/25/201507/03/2015  8/24/2015  1 
18/26/201507/03/2015  8/24/2015  2 
18/27/201507/03/2015  8/24/2015  3 
18/28/201507/03/2015  8/24/2015  4 
18/29/201507/03/2015  8/24/2015  5 
18/30/201507/03/20158/30/20158/30/20158/30/20158/30/2015 0 
18/31/201507/03/2015  8/30/20158/30/2015 1 
19/1/201507/03/2015  8/30/20158/30/2015 2 
19/2/201507/03/2015  8/30/20158/30/2015 30
19/3/201507/03/2015  8/30/20158/30/2015 41
19/4/201507/03/2015  8/30/20158/30/2015 52
28/31/201507/05/20158/31/2015 8/31/2015  0 
29/1/201507/05/2015  8/31/2015  1 
29/2/201507/05/2015  8/31/2015  2 
29/3/201507/05/2015 9/3/20158/31/20159/3/2015  0
29/4/201507/05/2015  8/31/20159/3/2015  1
Super User
Posts: 9,681

Re: data manipulation in SAS

data have;
infile cards expandtabs truncover;
input PatientID	Medicinedate $	LS_DT : mmddyy10.	TreatmentA : date9. treatmentB :date9.;
format date ls_dt TreatmentA  treatmentB mmddyy10.;
cards;
1	18-Aug	07/03/2015	. . 	 	46	 	 
1	19-Aug	07/03/2015	. . 	 	47	 	 
1	20-Aug	07/03/2015	 . .	 	48	 	 
1	21-Aug	07/03/2015	 . .	 	49	 	 
1	22-Aug	07/03/2015	 . .	 	50	 	 
1	23-Aug	07/03/2015	 . .	 	51	 	 
1	24-Aug	07/03/2015	24Aug2015	 .	 	0	 
1	25-Aug	07/03/2015	 . .	 	 	1	 
1	26-Aug	07/03/2015	 . .	 	 	2	 
1	27-Aug	07/03/2015	 . .	 	 	3	 
1	28-Aug	07/03/2015	 . .	 	 	4	 
1	29-Aug	07/03/2015	 . .	 	 	5	 
1	30-Aug	07/03/2015	30Aug2015	. 	 	0	 
1	31-Aug	07/03/2015	 . .	 	 	1	 
1	1-Sep	07/03/2015	 . .	 	 	2	 
1	2-Sep	07/03/2015	 .	2Sep2015	 	 	0
1	3-Sep	07/03/2015	 . .	 	 	 	1
1	4-Sep	07/03/2015	 . .	 	 	 	2
2	31-Aug	07/05/2015	31Aug2015	. 	57	0	 
2	1-Sep	07/05/2015	 . .	 	 	1	 
2	2-Sep	07/05/2015	 . .	 	 	2	 
2	3-Sep	07/05/2015	 .	3Sep2015	 	 	0
2	4-Sep	07/05/2015	 . .	 
;
run;
data want;
 set have;
 by PatientID;
 retain TreatmentADate2 TreatmentBDate2;
 if first.PatientID then call missing(TreatmentADate2,TreatmentBDate2);
 if not missing(TreatmentA) then TreatmentADate2=TreatmentA ;
 if not missing(treatmentB) then TreatmentBDate2=treatmentB ;
 format TreatmentADate2 TreatmentBDate2 date9.;
run;
Occasional Contributor
Posts: 17

Re: data manipulation in SAS

Thanks a lot! Retain function is very useful! :-) SAS community is so helpful. Thanks to all!
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 262 views
  • 6 likes
  • 6 in conversation