I have a dataset that looks like all_dates dataset below:
data all_dates;
input id: $1.
start_dt: mmddyy10.
drugA_dt: mmddyy10.
drugB_dt: mmddyy10.
drugC_dt: mmddyy10.
drugD_dt: mmddyy10.
drugE_dt: mmddyy10.
stop_dt: mmddyy10.;
format start_dt drugA_dt drugB_dt drugC_dt drugD_dt drugE_dt stop_dt mmddyy10.;
datalines;
A 01/01/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 12/31/2017
;
run;
I want to create multiple rows for the individuals based on the minimum of all dates except for start_dt. In the given example, the start_dt of second row should be the lowest of all other dates and stop_dt should be the second lowest. My final dataset should look something like this:
data want;
input id: $1.
start_dt: mmddyy10.
drugA_dt: mmddyy10.
drugB_dt: mmddyy10.
drugC_dt: mmddyy10.
drugD_dt: mmddyy10.
drugE_dt: mmddyy10.
stop_dt: mmddyy10.;
format start_dt drugA_dt drugB_dt drugC_dt drugD_dt drugE_dt stop_dt mmddyy10.;
datalines;
A 01/01/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 01/25/2017
A 01/25/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 02/18/2017
A 02/18/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 03/28/2017
A 03/28/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 05/10/2017
A 05/10/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 07/05/2017
A 07/05/2017 01/25/2017 02/18/2017 03/28/2017 05/10/2017 07/05/2017 12/31/2017
;
run;
Notice how the start_dt and stop_dt move. I have been struggling with this and cannot figure out the solution. Thank you!
This has a whiff of an X-Y problem about it.
Could you explain the data more and tell us what are trying to do?
I am trying to perform time-varying analysis. For this, I need data in long format, with each row corresponding to a continuous time sequence where none of the variables (here, drugs) change values. Those drug dates indicate initiation of those drugs, which would warrant a new row. The start and stop dates indicate study duration.
This works for the example data.
data want; set all_dates (rename=(stop_dt= ostop)); array d start_dt drugA_dt drugB_dt drugC_dt drugD_dt drugE_dt; format stop_dt mmddyy10.; do i= 1 to dim(d); if i < dim(d) then stop_dt=d[i+1]; else stop_dt=ostop; if i > 1 then start_dt=d[i]; output; end; drop i ostop; run;
This works, but only when the dates are arranged in the order I have posted. Order could very well be different: if drug A and drug B could swap positions, this doesn't work. I am trying to modify it by using something like min ( of d[i+1] - dim(d)), but it is not working. Also, when there are many individuals, I cannot arrange data in that order (because some of them use drugC first followed by drugA, unlike that of individual A posted above).
So, as I suggested, maybe you could explain a little and tell us what you really want to do?
That explanation really isn't helpful in understanding, at least to me.
I don't even know what "I need data in long format" means.
@Mahip wrote:
This works, but only when the dates are arranged in the order I have posted. Order could very well be different: if drug A and drug B could swap positions, this doesn't work. I am trying to modify it by using something like min ( of d[i+1] - dim(d)), but it is not working. Also, when there are many individuals, I cannot arrange data in that order (because some of them use drugC first followed by drugA, unlike that of individual A posted above).
NONE of these dependencies are mentioned in your original post.
So go back and provide examples of each of the generic types of use cases that you have and what the output order should actually be.
Or spend some time writing out the rules and exceptions.
There was reason that I specifically said : This works for the example. I suspected there was something else going on that you hadn't stated.
It may be that your start data needs to be reformatted to indicate something like
ID Date Condition
where condition takes on values of Start (what ever that meant) the drug name or code, and End (whatever that actually means). That would at least maintain the date order information. Especially if there are other bits of information associated with the time such as a test result or other factors involved that were related to why a drug was chosen/ changed or added.
I could not see an obvious way of using that "want" data.
You might also provide some idea of the following analysis steps, such a Procs or what not.
It may be that your start data needs to be reformatted to indicate something like
ID Date Condition
Yes.
A data structure like
Studies
Study name- A, B, C
Study start date- date
Study end date - date
Study purpose
whatever
Drugs Administered
Study name- A, B, C
Drug name/id- Azol, Bzol, Cnol, Derin
Drug start date- date
Drug stop date - date
Patients
Patient ID- id
Study name
Patient diagnosis
etc
starts to look like something with which one could do some analytic work.
Thank you all for your comments. In short, the link below is similar to what I am trying to do, except for I have multiple time-varying variables which is throwing me off.
http://support.sas.com/resources/papers/proceedings12/168-2012.pdf
You don't seem to be able to explain your data nor present what it should look like. What multiple time-varying variables?
Good luck.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.