BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cuan
Obsidian | Level 7
Dear Sas Community,

have the next problem, it is not trivial, please try to give me the dataset I am asking for below. Thank you very much in advance.

data have;
input id treat date: date9.;
format date date9.;
cards;
1 1 16oct2017
1 1 30oct2017
1 2 14nov2017
1 2 26nov2017
1 2 27nov2017
1 1 11dec2017
1 2 12dec2017
1 2 13dec2017
1 2 24dec2017
1 2 25dec2017
1 1 27dec2017
1 2 09jan2018
2 1 16oct2017
2 1 30oct2017
2 2 14nov2017
2 2 26nov2017
2 2 27nov2017
2 1 11dec2017
2 2 12dec2017
2 2 24dec2017
2 2 25dec2017
2 1 27dec2017
2 2 09jan2018
2 2 10jan2018
2 2 11jan2018
2 2 22jan2018
2 1 05feb2018



I need a dataset want:

subjid treat start stop
1 1 16oct2017 13nov2017
1 1 28nov2017 11dec2017
1 1 26dec2017 08jan2017
2 1 16oct2017 13nov2017
2 1 28nov2017 11dec2017
2 1 26dec2017 08jan2017
2 1 23jan2018 05feb2018

Thank you very much.

Cuan.
1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

I adapted my code to your last explanation and answer to @Reeza  to get the wanted results:

data want(drop=ID treat date rename=(svid=ID svtrt=treat));
 set have end=eof;
  by ID notsorted treat;
     retain svid svtrt start;
     format start stop date9.;
     if _N_ = 1 or first.ID then do;
        svid  = ID;
        svtrt = treat;
        start = date;
     end; 
     else do;
	     if treat=2 then do;
	        if svtrt=1 then do;
		       stop = date-1;
		       output;
		    end;
	        svtrt=2;
	     end; else svtrt=1;
	     if treat=2 and last.treat then start=date+1;
	     if first.treat then svtreat=treat;
	     if last.ID then do;
	        stop=date;
	        if svtrt = 1 then output;
	     end;
	 end;
run;
proc sort data=want; by ID treat; run;     

View solution in original post

11 REPLIES 11
Reeza
Super User

Please provide the logical rules to create the second data set.

 


@cuan wrote:
Dear Sas Community,

have the next problem, it is not trivial, please try to give me the dataset I am asking for below. Thank you very much in advance.

data have;
input id treat date: date9.;
format date date9.;
cards;
1 1 16oct2017
1 1 30oct2017
1 2 14nov2017
1 2 26nov2017
1 2 27nov2017
1 1 11dec2017
1 2 12dec2017
1 2 13dec2017
1 2 24dec2017
1 2 25dec2017
1 1 27dec2017
1 2 09jan2018
2 1 16oct2017
2 1 30oct2017
2 2 14nov2017
2 2 26nov2017
2 2 27nov2017
2 1 11dec2017
2 2 12dec2017
2 2 24dec2017
2 2 25dec2017
2 1 27dec2017
2 2 09jan2018
2 2 10jan2018
2 2 11jan2018
2 2 22jan2018
2 1 05feb2018



I need a dataset want:

subjid treat start stop
1 1 16oct2017 13nov2017
1 1 28nov2017 11dec2017
1 1 26dec2017 08jan2017
2 1 16oct2017 13nov2017
2 1 28nov2017 11dec2017
2 1 26dec2017 08jan2017
2 1 23jan2018 05feb2018

Thank you very much.

Cuan.

 

cuan
Obsidian | Level 7
If you can see, I need to know when these patients is taking only treatment
1 (is a drug called A) , treatment 2 is a combinación drug, called (A+B).
I only interested in the period the patient is taking treatment 1 (mono
therapy drug A).

I hope it is clear.


Thanks.

Reeza
Super User

What does the date variable mean?

Does it mean that a person started taking treatment1/2 at this time?

 

I don't understand this portion:

Original Data:

1 2 27nov2017
1 1 11dec2017

 

Desired data:

1 1 28nov2017 11dec2017

 

Why is the start Nov 28, one day after Treatment2 starts? It seems like Treatment A, didn't start again until 11 December. 

 

 

 

 


cuan
Obsidian | Level 7
Hi Reza,
The treat =2, is that patient is taking a combination of drugs (A+B) , I mean this patient is taking daily Drug A, but for some specific periods he takes an extra Drug B.
Then, that portion means:
1 2 27nov2017
1 1 11dec2017

The patient is taking the combination of drugs that day 27 (A+B) , and stop. Then next day (day 28) he is taking the daily drug A, until 11 dec2017, because not more treatment 2 in that period.
The logic is, when the patient is not taking treatment 2 (combination treatment A+B) automatically we know he is taking only treatment 1(drug A) . I hope it is more clear now. It is not easy task , that’s why I ask you to solve. I hope you can find a solution. Thanks.
Shmuel
Garnet | Level 18

I adapted my code to your last explanation and answer to @Reeza  to get the wanted results:

data want(drop=ID treat date rename=(svid=ID svtrt=treat));
 set have end=eof;
  by ID notsorted treat;
     retain svid svtrt start;
     format start stop date9.;
     if _N_ = 1 or first.ID then do;
        svid  = ID;
        svtrt = treat;
        start = date;
     end; 
     else do;
	     if treat=2 then do;
	        if svtrt=1 then do;
		       stop = date-1;
		       output;
		    end;
	        svtrt=2;
	     end; else svtrt=1;
	     if treat=2 and last.treat then start=date+1;
	     if first.treat then svtreat=treat;
	     if last.ID then do;
	        stop=date;
	        if svtrt = 1 then output;
	     end;
	 end;
run;
proc sort data=want; by ID treat; run;     
cuan
Obsidian | Level 7

Thanks a lot.

It was a challenging one.

Cuan.

Reeza
Super User

The reason that doesn't make sense to me is that you use the exact opposite logic in the first two records where the A is assumed to carry forward, but here you're saying it doesn't carry forward. 

 

1 1 16oct2017 13nov2017

 

From what I can see this is based on the date that is prior the drug regiment changing. So it seems like the logic isn't consistent to me. 

 

cuan
Obsidian | Level 7
Dear Reza, I am Sorry, maybe you didn't understand my explanation,
There are a logic in my dataset WANT, and @Shmuel code solved that problem.
Shmuel
Garnet | Level 18

Please check next code, though the result is not as expected:

data want(drop=ID treat date rename=(svid=ID svtrt=treat));
 set have end=eof;
  by ID notsorted treat;
     retain svid svtrt start;
     format start stop date9.;
     if _N_ = 1 then do;
        svid  = ID;
        svtrt = treat;
        start = date;
     end;
     if first.treat and _N_>1 then do;
        stop = date-1;
        if svtrt = 1 then output;
        svid  = ID;
        svtrt = treat;
        start = date;
     end;
     if last.ID then do;
        stop=date;
        if svtrt = 1 then output;
     end;
run;
proc sort data=want; by ID treat; run;     
FreelanceReinh
Jade | Level 19

Hello @cuan,

 

Here's another, independently developed solution, so you can validate one against the other.

 

/* Create larger test dataset */

data have(drop=phase);
call streaminit(27182818);
do id=1 to 50000;
  date='01JAN2000'd+rand('integer',10);
  do phase=1 to rand('integer',50);
    treat=rand('table',0.5);
    output;
    date+rand('integer',1,50); /* "('integer',0,50)" would cause ambiguities */
  end;
end;
format date date9.;
run;

/* Create dataset with periods of treatment 1 */

data want(drop=date _: rename=(id=subjid));
array _t[%sysevalf('01JAN2000'd):%eval(%sysfunc(today())+100)] _temporary_;
do until(last.id);
  set have;
  by id date;
  if first.id then _first_trt1=date;
  if treat~=1 then do;
    if _trt2_b=. then _trt2_b=date;
    _trt2_e=date;
  end;
  if _trt2_e>. & (treat=1 | last.id) then do;
    do _d=_trt2_b to _trt2_e;
      _t[_d]=2;
    end;
    call missing(of _trt2:);
  end;
end;
do _d=_first_trt1 to date;
  if _t[_d]=. then _t[_d]=1;
end;
do _d=lbound(_t)+1 to hbound(_t)-1;
  if _t[_d]=1 then do;
    if _t[_d-1]~=1 then start=_d;
    if _t[_d+1]~=1 then stop=_d;
  end;
  if n(start, stop)=2 then do;
    treat=1;
    output;
    call missing(start, stop);
  end;
end;
call missing(of _t[*]);
format start stop date9.;
run;

It's less elegant than @Shmuel's and also much slower on the large HAVE dataset created above (with more than one million observations). If your real dataset is much smaller (say, 1,000 patients, not 50,000), we're talking about fractions of a second, though. You can also improve performance by choosing more realistic lower and upper bounds (based on your data) for the temporary array _t, which just needs to cover all of your treatment dates +/- 1 day (which I assumed to fall between January 1, 2000 and 100 days from today).

 

Note that the resulting WANT datasets do not match for the large HAVE input dataset. Maybe they do with your real data because certain rare situations (like a patient treated only one day, with treat=1) don't occur in your data. You may want to check a few of the discrepancies (e.g. the single date difference for ID=6 of my simulated HAVE dataset) to get an idea of whether they are relevant for your application. The "ambiguities" mentioned in the comment above would arise if duplicate dates were allowed within an ID in dataset HAVE (e.g., two observations with the same ID and date, one with treat=1 and one with treat=2; it's not obvious what this would imply for the treatment on the preceding days, after an observation with treat=2). This would also cause more discrepancies between the WANT datasets of the two solutions.

cuan
Obsidian | Level 7

Hello dear,

 

thank you very much for your code.

 

I compared it in my real data (10 patients), and I have patient took mono only 1 day, and  @Shmuel  code is working  perfectly for this case also very well, that's  why I gave him the valid solution, because it is  working great

in my real data.

 

Regards,

 

Cuan.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3270 views
  • 4 likes
  • 4 in conversation