BookmarkSubscribeRSS Feed
Mahip
Obsidian | Level 7

I have a dataset that looks like all_dates dataset below:

 

data all_dates;
input id: $1. 
      start_dt: mmddyy10. 
      drugA_dt: mmddyy10. 
      drugB_dt: mmddyy10. 
      drugC_dt: mmddyy10. 
      drugD_dt: mmddyy10. 
      drugE_dt: mmddyy10. 
      stop_dt: mmddyy10.;
format start_dt drugA_dt drugB_dt drugC_dt  drugD_dt drugE_dt stop_dt mmddyy10.;
datalines;
A 01/01/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 12/31/2017
;
run;

I want to create multiple rows for the individuals based on the minimum of all dates except for start_dt. In the given example, the start_dt of second row should be the lowest of all other dates and stop_dt should be the second lowest. My final dataset should look something like this:

data want;
input id: $1. 
      start_dt: mmddyy10. 
      drugA_dt: mmddyy10. 
      drugB_dt: mmddyy10. 
      drugC_dt: mmddyy10. 
      drugD_dt: mmddyy10. 
      drugE_dt: mmddyy10. 
      stop_dt: mmddyy10.;
format start_dt drugA_dt drugB_dt drugC_dt  drugD_dt drugE_dt stop_dt mmddyy10.;
datalines;
A 01/01/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 01/25/2017
A 01/25/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 02/18/2017
A 02/18/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 03/28/2017
A 03/28/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 05/10/2017
A 05/10/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 07/05/2017
A 07/05/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 12/31/2017
;
run;

Notice how the start_dt and stop_dt move. I have been struggling with this and cannot figure out the solution. Thank you!

11 REPLIES 11
HB
Barite | Level 11 HB
Barite | Level 11

This has a whiff of an X-Y problem about it.

 

Could you explain the data more and tell us what are trying to do? 

 

 

http://xyproblem.info/

 

   

Mahip
Obsidian | Level 7

I am trying to perform time-varying analysis. For this, I need data in long format, with each row corresponding to a continuous time sequence where none of the variables (here, drugs) change values. Those drug dates indicate initiation of those drugs, which would warrant a new row. The start and stop dates indicate study duration.

ballardw
Super User

This works for the example data.

 

data want;
   set all_dates (rename=(stop_dt= ostop));
   array d start_dt drugA_dt drugB_dt drugC_dt  drugD_dt drugE_dt;
   format stop_dt mmddyy10.;
   do i= 1 to dim(d);
      if i < dim(d) then stop_dt=d[i+1];
      else stop_dt=ostop;
      if i > 1 then start_dt=d[i];
      output;
   end;
   drop i ostop;
run;
      
Mahip
Obsidian | Level 7

This works, but only when the dates are arranged in the order I have posted. Order could very well be different: if drug A and drug B could swap positions, this doesn't work. I am trying to modify it by using something like min ( of d[i+1] - dim(d)), but it is not working. Also, when there are many individuals, I cannot arrange data in that order (because some of them use drugC first followed by drugA, unlike that of individual A posted above). 

HB
Barite | Level 11 HB
Barite | Level 11

So, as I suggested, maybe you could explain a little and tell us what you really want to do?

Mahip
Obsidian | Level 7
I have explained above, in reply to your previous response. I am copying the same response here:

"I am trying to perform time-varying analysis. For this, I need data in long format, with each row corresponding to a continuous time sequence where none of the variables (here, drugs) change values. Those drug dates indicate initiation of those drugs, which would warrant a new row. The start and stop dates indicate study duration."
HB
Barite | Level 11 HB
Barite | Level 11

That explanation really isn't helpful in understanding, at least to me. 

 

I don't even know what "I need data in long format" means. 

ballardw
Super User

@Mahip wrote:

This works, but only when the dates are arranged in the order I have posted. Order could very well be different: if drug A and drug B could swap positions, this doesn't work. I am trying to modify it by using something like min ( of d[i+1] - dim(d)), but it is not working. Also, when there are many individuals, I cannot arrange data in that order (because some of them use drugC first followed by drugA, unlike that of individual A posted above). 


NONE of these dependencies are mentioned in your original post.

So go back and provide examples of each of the generic types of use cases that you have and what the output order should actually be.

Or spend some time writing out the rules and exceptions.

 

There was  reason that I specifically said : This works for the example. I suspected there was something else going on that you hadn't stated.

 

It may be that your start data needs to be reformatted to indicate something like

ID   Date Condition

where condition takes on values of Start (what ever that meant) the drug name or code, and End (whatever that actually means). That would at least maintain the date order information. Especially if there are other bits of information associated with the time such as a test result or other factors involved that were related to why a drug was chosen/ changed or added.

 

I could not see an obvious way of using that "want" data.

 

You might also provide some idea of the following analysis steps, such a Procs or what not.

HB
Barite | Level 11 HB
Barite | Level 11

 

It may be that your start data needs to be reformatted to indicate something like

ID   Date Condition

 

Yes.

 

A data structure like 

 

Studies

Study name- A, B, C

Study start date- date

Study end date - date

Study purpose

whatever

 

Drugs Administered

Study name- A, B, C

Drug name/id- Azol, Bzol, Cnol, Derin

Drug start date-  date

Drug stop date - date

 

Patients

Patient ID- id

Study name

Patient diagnosis

etc

 

 

starts to look like something with which one could do some analytic work. 

 

Mahip
Obsidian | Level 7

Thank you all for your comments. In short, the link below is similar to what I am trying to do, except for I have multiple time-varying variables which is throwing me off.

 

 

http://support.sas.com/resources/papers/proceedings12/168-2012.pdf

 

HB
Barite | Level 11 HB
Barite | Level 11

You don't seem to be able to explain your data nor present what it should look like.  What multiple time-varying variables?

 

Good luck.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2115 views
  • 2 likes
  • 3 in conversation