Hello Community,
I am working on a data analysis problem for which I would greatly appreciate some programming help. Basically, I need to know whether or not subjects in my dataset reported drug use (Drug A) during a research study at specific time points. Specifically, I would like to know whether subjects reported use of Drug A during each of the following time periods during the study:
Please see an example of the desired output data set (Want) below. The VISITNO corresponds to the number of each of the 3 study visits where Visit 1 would correspond to the Screening date.
Want:
PID | VISITNO | Date | Drug A Use |
01 | 1 | 03/30/2018 | 1 |
01 | 2 | 05/02/2018 | 0 |
01 | 3 | 05/30/2018 | 0 |
02 | 1 | 04/09/2018 | 1 |
02 | 2 | 05/09/2018 | 1 |
02 | 3 | . | . |
….. |
|
|
|
I have also provided an example of the two data sets I have to produce this output. The first is a dataset listing the visit numbers and visit dates for each subject (Have 1). Note that the Screening Visit is a separate variable; however, I would like to have a single variable in my output dataset that designates the Screening Visit as Visit 1 (See Want above). Please note that some visits may not have attended a study visit, hence, the missing data for a visit date (subject 02 in the example below).
Have 1:
Subject ID | Screen_Date | VisitNo | Visit_date |
01 | 03/30/2018 | 2 | 05/02/2018 |
01 | 03/30/2018 | 3 | 05/30/2018 |
02 | 04/09/2018 | 2 | 05/09/2018 |
02 | 04/09/2018 | 3 | . |
The second dataset that I have indicates the dates that drug use occurred on (Have 2). That is, Drug A use is indicated by a "1" in the "D1....D7 drug A use" columns. Each row of data for each participant represents an entire week. The "D1 drug use" column indicates drug use for the "D1 date" column, the "D2 drug use" column indicates drug use for the "D2 date" column, and so on.
Please also note that some participants may not have completed a study visit. Therefore, the drug use for that visit would be missing (e.g., Subject ID 02 in the example below).
Have 2:
Subject ID | D1 date | D2 date | D3 date | D4 date | D5 date | D6 date | D7 date | D1 drug A use | D2 drug A use | D3 drug A use | D4 drug A use | D5 drug A use | D6 drug A use | D7 drug A use |
01 | . | . | . | 02/28/18 | 03/01/18 | 03/02/18 | 03/03/18 | . | . | . | 0 | 0 | 1 | 0 |
01 | 03/04/18 | 03/05/18 | 03/06/18 | 03/07/18 | 03/08/18 | 03/09/18 | 03/10/18 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
01 | 03/11/18 | 03/12/18 | 03/13/18 | 03/14/18 | 03/15/18 | 03/16/18 | 03/17/18 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
01 | 03/18/18 | 03/19/18 | 03/20/18 | 03/21/18 | 03/22/18 | 03/23/18 | 03/24/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 03/25/18 | 03/26/18 | 03/27/18 | 03/28/18 | 03/29/18 | 03/30/18 | 03/31/18 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
01 | 04/01/18 | 04/02/18 | 04/03/18 | 04/04/18 | 04/05/18 | 04/06/18 | 04/07/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 04/08/18 | 04/09/18 | 04/10/18 | 04/11/18 | 04/12/18 | 04/13/18 | 04/14/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 04/15/18 | 04/16/18 | 04/17/18 | 04/18/18 | 04/19/18 | 04/20/18 | 04/21/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 04/22/18 | 04/23/18 | 04/24/18 | 04/25/18 | 04/26/18 | 04/27/18 | 04/28/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 04/29/18 | 04/30/18 | 05/01/18 | 05/02/18 | 05/03/18 | 05/04/18 | 05/05/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 05/06/18 | 05/07/18 | 05/08/18 | 05/09/18 | 05/10/18 | 05/11/18 | 05/12/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 05/13/18 | 05/14/18 | 05/15/18 | 05/16/18 | 05/17/18 | 05/18/18 | 05/19/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 05/20/18 | 05/21/18 | 05/22/18 | 05/23/18 | 05/24/18 | 05/25/18 | 05/26/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 05/27/18 | 05/28/18 | 05/29/18 | . | . | . | . | 0 | 0 | 0 | . | . | . | . |
02 | . | . | . | . | . | . | 03/10/18 | . | . | . | . | . | . | 0 |
02 | 03/11/18 | 03/12/18 | 03/13/18 | 03/14/18 | 03/15/18 | 03/16/18 | 03/17/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
02 | 03/18/18 | 03/19/18 | 03/20/18 | 03/21/18 | 03/22/18 | 03/23/18 | 03/24/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
02 | 03/25/18 | 03/26/18 | 03/27/18 | 03/28/18 | 03/29/18 | 03/30/18 | 03/31/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
02 | 04/01/18 | 04/02/18 | 04/03/18 | 04/04/18 | 04/05/18 | 04/06/18 | 04/07/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
02 | 04/08/18 | 04/09/18 | 04/10/18 | 04/11/18 | 04/12/18 | 04/13/18 | 04/14/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
02 | 04/15/18 | 04/16/18 | 04/17/18 | 04/18/18 | 04/19/18 | 04/20/18 | 04/21/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
02 | 04/22/18 | 04/23/18 | 04/24/18 | 04/25/18 | 04/26/18 | 04/27/18 | 04/28/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
02 | 04/29/18 | 04/30/18 | 05/01/18 | 05/02/18 | 05/03/18 | 05/04/18 | 05/05/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
02 | 05/06/18 | 05/07/18 | 05/08/18 | 05/09/18 | . | . | . | 1 | 1 | 1 | 1 | . | . | . |
….. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note: The data includes 30 days prior to each subject’s Screening Date; hence, the missing data for some cells.
Is screen date in HAVE1 the same date that is attributed to "visit 1" in dataset want?
Your HAVE2 data sample is a good example of why you will frequently see experienced users in sas community refer to the operational advantages of skinny datasets to wide datasets. HAVE2 has 7 date variables (and 7 drug use dummies), and you need to compare EACH of those dates to the visit dates. If, instead of 7 variables you have 7 rows, with 3 variables: ID, DATE, DRUGA_USE dummy, your programming task would be far easier.
Similarly with HAVE1: If you had a separate record for the screen_date, programming would be easier.
So here is a suggestion that prepares HAVE1 by creating a separate record with DATE=SCREEN_DATE, and also modifies HAVE2 to have one record per date in which druga use occurred. Then the programming (see the 3rd step below) is almost trivial.
If you are able to provide your sample HAVE1 and HAVE2 in the form of a working DATA steps, I would be happy to test the program:
/* Make a dataset with one row for each date with actual drug use */
data druguse_dates (keep=pid date druga_use) / view=druguse_dates ;
set have2;
array dt {7} d1date d2date d3date d4date d5date d6date d7date;
array du {7} d1usea d2usea d3usea d4usea d5usea d6usea d7usea;
druga_use=1;
format date date9. ; /* Revised to include a format */
do i=1 to 7;
if du{i}=1 then do;
date=dt{i};
output;
end;
end;
run;
/* Make a dataset with one row per visit plus one row for screen_date (i.e. visitno=1) */
data visits (keep=pid date visitno);
set have1;
by pid;
if first.pid do;
vno=visitno; /* Save the visit number */
format date date9. ; /* Revised to include a format */
date=screen_date;
visitno=1; /*Presuming visit 1 is the same as the screen visit */
output;
visitno=vno; /* Restore the visit number */
end;
date=visit_date;
if date=. then date='31dec2099'd+visit_no; /* As a signal for later */
output;
run;
/* Now output 1 record per visit */
data want;
set visits (in=invisit)
druguse_dates (drop=druga_use in=use_date) ;
by pid date;
if invisit=1 then do;
if date>='31dec2099'd then call missing(date,druga_use);
output;
druga_use=0;
end;
if use_date=1 then set druguse_dates (keep=druga_use);
if last.pid then druga_use=0;
run;
This program assumes both HAVE1 and HAVE2 are sorted by PID/DATE.
In the case of missing VISIT3 date (or any missing visit date), I set the DATE='31dec2099'd + visit_no. This is so that the "by pid date" statement in the DATA WANT step does not find dates out of order, and stop the step. Of course, that mean you have to test for those future dates and reset to missing prior to output.
The "trick" in the data WANT step is to note that the program only reads in the DRUGA_USE dummy (from dataset druguse_dates) when the "IF use_date=1 then set" condition is met. So reading in the dummy variable is exactly synchronized with reading the rest of the variables from druguse_dates. But unlike those other variables read in the first unconditional SET statement, the DRUGA_USE variable will be retained until reset to another value or re-read in the conditional set statement.
** Editted in response to @cj3 response: I've inserted two format date date9.; statements. Remember, it doesn't change the underlying value (the number of days after 01jan1960) - just the way the value is displayed. For instance April 14, 2019 has the value 21653. And that's how the date variables would be displayed, until I told SAS to use the format date9., forcing it to show as 14APR2019.
Hi @mkeintz,
Thank you so much for your time, this worked perfectly! I am still very much a SAS newbie, so I was wondering if you could also provide some feedback on how to slightly modify the above code to further suit my aims.
1. First, could you please show me how the above code could be modified to efficiently examine use of additional drugs, such as adding "drugb_use", "drugc_use" variables in addition to "druga_use"?
2. Also, it appears that the code you provided put the dates in BEST12. format, although my input datasets have the dates in MMDDYY8. format. Any suggestions on how to get the date format back to MMDDYY8. would be appreciated as this format just seems to be a little more readable.
Thank you!!
@mkeintz also to follow up regarding adding additional drug use variables, my data set has it where a non-missing value (1/0) for the additional drug use variable (e.g., drugb_use) is only present for a date if there is a "1" for druga_use for that same date. In other words, "drugb_use" would not be assessed if there was no "druga_use" for a particular date and therefore would show as missing. However, my desired output would indicate no "drugb_use" if there was also no "druga_use", as opposed to it showing as missing. Of course, if the visit date is missing, then druga_use and drubb_use would also be missing. I have provided my example dataset again below with this added as well as the desired dataset. Any suggestions on incorporating this to the earlier code would be greatly appreciated.
Want:
PID | VISITNO | Date | Drug A Use | Drug B Use |
01 | 1 | 03/30/2018 | 1 | 0 |
01 | 2 | 05/02/2018 | 0 | 0 |
01 | 3 | 05/30/2018 | 0 | 0 |
02 | 1 | 04/09/2018 | 1 | 1 |
02 | 2 | 05/09/2018 | 1 | 1 |
02 | 3 | . | . | . |
….. |
|
|
|
|
Have:
Subject ID | D1 date | D2 date | D3 date | D4 date | D5 date | D6 date | D7 date | D1 drug A use | D2 drug A use | D3 drug A use | D4 drug A use | D5 drug A use | D6 drug A use | D7 drug A use | D1 drug B use | D2 drug B use | D3 drug B use | D4 drug B use | D5 drug B use | D6 drug B use | D7 drug B use |
01 | . | . | . | 02/28/18 | 03/01/18 | 03/02/18 | 03/03/18 | . | . | . | 0 | 0 | 1 | 0 | . | . | . | . | . | 0 | . |
01 | 03/04/18 | 03/05/18 | 03/06/18 | 03/07/18 | 03/08/18 | 03/09/18 | 03/10/18 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | . | . | . | . | . | 0 | . |
01 | 03/11/18 | 03/12/18 | 03/13/18 | 03/14/18 | 03/15/18 | 03/16/18 | 03/17/18 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | . | 0 | . | . | . | . | . |
01 | 03/18/18 | 03/19/18 | 03/20/18 | 03/21/18 | 03/22/18 | 03/23/18 | 03/24/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | . | . | . | . | . | . | . |
01 | 03/25/18 | 03/26/18 | 03/27/18 | 03/28/18 | 03/29/18 | 03/30/18 | 03/31/18 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | . | . | 0 | . | . | . | . |
01 | 04/01/18 | 04/02/18 | 04/03/18 | 04/04/18 | 04/05/18 | 04/06/18 | 04/07/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | . | . | . | . | . | . | . |
01 | 04/08/18 | 04/09/18 | 04/10/18 | 04/11/18 | 04/12/18 | 04/13/18 | 04/14/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | . | . | . | . | . | . | . |
01 | 04/15/18 | 04/16/18 | 04/17/18 | 04/18/18 | 04/19/18 | 04/20/18 | 04/21/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | . | . | . | . | . | . | . |
01 | 04/22/18 | 04/23/18 | 04/24/18 | 04/25/18 | 04/26/18 | 04/27/18 | 04/28/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | . | . | . | . | . | . | . |
01 | 04/29/18 | 04/30/18 | 05/01/18 | 05/02/18 | 05/03/18 | 05/04/18 | 05/05/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | . | . | . | . | . | . | . |
01 | 05/06/18 | 05/07/18 | 05/08/18 | 05/09/18 | 05/10/18 | 05/11/18 | 05/12/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | . | . | . | . | . | . | . |
01 | 05/13/18 | 05/14/18 | 05/15/18 | 05/16/18 | 05/17/18 | 05/18/18 | 05/19/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | . | . | . | . | . | . | . |
01 | 05/20/18 | 05/21/18 | 05/22/18 | 05/23/18 | 05/24/18 | 05/25/18 | 05/26/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | . | . | . | . | . | . | . |
01 | 05/27/18 | 05/28/18 | 05/29/18 | . | . | . | . | 0 | 0 | 0 | . | . | . | . | . | . | . | . | . | . | . |
02 | . | . | . | . | . | . | 03/10/18 | . | . | . | . | . | . | 0 | . | . | . | . | . | . | . |
02 | 03/11/18 | 03/12/18 | 03/13/18 | 03/14/18 | 03/15/18 | 03/16/18 | 03/17/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | . | 1 | 1 | . |
02 | 03/18/18 | 03/19/18 | 03/20/18 | 03/21/18 | 03/22/18 | 03/23/18 | 03/24/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | . | 1 | 1 | . |
02 | 03/25/18 | 03/26/18 | 03/27/18 | 03/28/18 | 03/29/18 | 03/30/18 | 03/31/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | . | 1 | 1 | . |
02 | 04/01/18 | 04/02/18 | 04/03/18 | 04/04/18 | 04/05/18 | 04/06/18 | 04/07/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | . | 1 | 1 | . |
02 | 04/08/18 | 04/09/18 | 04/10/18 | 04/11/18 | 04/12/18 | 04/13/18 | 04/14/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | . | 1 | 1 | . |
02 | 04/15/18 | 04/16/18 | 04/17/18 | 04/18/18 | 04/19/18 | 04/20/18 | 04/21/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | . | 1 | 1 | . |
02 | 04/22/18 | 04/23/18 | 04/24/18 | 04/25/18 | 04/26/18 | 04/27/18 | 04/28/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | . | 1 | 1 | . |
02 | 04/29/18 | 04/30/18 | 05/01/18 | 05/02/18 | 05/03/18 | 05/04/18 | 05/05/18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | . | 1 | 1 | . |
02 | 05/06/18 | 05/07/18 | 05/08/18 | 05/09/18 | . | . | . | 1 | 1 | 1 | 1 | . | . | . | 1 | 1 | 1 | 1 | . | . | . |
….. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If you have several drugs whose usage between visits you want to record, you could choose to make (from HAVE2) one dataset per drug with observations only for dates on which the corresponding drug was taken. Then you can used each of these datasets (dates_drugusea dates_druguseb …) in the DATA WANT step:
Again, this program is untested, in the absence of a working DATA step with sample data:
/* Make a dataset, for each drug, with one row for each date of actual drug use */
data dates_drugeusea (keep=pid date druga_use)
dates_druguseb (keep=pid date drugb_use) ;
set have2;
array dt {7} d1date d2date d3date d4date d5date d6date d7date;
array da {7} d1usea d2usea d3usea d4usea d5usea d6usea d7usea;
array db {7} d1useb d2useb d3useb d4useb d5useb d6useb d7useb;
format date date9. ; /* Revised to include a format */
do i=1 to 7;
druga_use=da{i};
drugb_use=db{i};
date=dt{i};
if druga_use=1 then output dates_drugusea ;
if drugb_use=1 then output dates_druguseb ;
end;
run;
/* Make a dataset with one row per visit plus one row for screen_date (i.e. visitno=1) */
data visits (keep=pid date visitno);
set have1;
by pid;
if first.pid do;
vno=visitno; /* Save the visit number */
format date date9. ; /* Revised to include a format */
date=screen_date;
visitno=1; /*Presuming visit 1 is the same as the screen visit */
output;
visitno=vno; /* Restore the visit number */
end;
date=visit_date;
if date=. then date='31dec2099'd+visit_no; /* As a signal for later */
output;
run;
/* Now output 1 final record per visit, with preceding drug use (on or after prior visit) */
data want (drop=_:);
set visits (in=invisit)
dates_drugusea (drop=druga_use in=use_date_a)
dates_druguseb (drop=druga_use in=use_date_b) ;
by pid date;
if invisit=1 then do;
if date>='31dec2099'd then call missing(date,druga_use,drugb_use);
output;
druga_use=0;
drugb_use=0;
end;
if use_date_a=1 then set dates_drugusea (keep=druga_use);
if use_date_b=1 then set dates_druguseb (keep=drugb_use);
if last.pid then druga_use=0;
run;
@mkeintz Great, thank you! I tested the code and it works except that a missing value is showing up for drugb_use at Visit 1 for subject 1. See below:
Getting:
PID | VISITNO | Date | Druga_use | Drugb_use |
01 | 1 | 03/30/2018 | 1 | . |
01 | 2 | 05/02/2018 | 0 | 0 |
01 | 3 | 05/30/2018 | 0 | 0 |
02 | 1 | 04/09/2018 | 1 | 1 |
02 | 2 | 05/09/2018 | 1 | 1 |
02 | 3 | . | . | . |
Instead, drugb_use at Visit 1 for subject 1 should be 0:
Need:
PID | VISITNO | Date | Druga_use | Drugb_use |
01 | 1 | 03/30/2018 | 1 | 0 |
01 | 2 | 05/02/2018 | 0 | 0 |
01 | 3 | 05/30/2018 | 0 | 0 |
02 | 1 | 04/09/2018 | 1 | 1 |
02 | 2 | 05/09/2018 | 1 | 1 |
02 | 3 | . | . | . |
Can you please let me know how the code should be modified to fix this? I have provided the datasets below for your reference.
data have1;
input
pid d1date d2date d3date d4date d5date d6date d7date
d1usea d2usea d3usea d4usea d5usea d6usea d7usea
d1useb d2useb d3useb d4useb d5useb d6useb d7useb;
informat d1date d2date d3date d4date d5date d6date d7date MMDDYY8.;
format d1date d2date d3date d4date d5date d6date d7date MMDDYY8.;
informat d1usea d2usea d3usea d4usea d5usea d6usea d7usea $1.;
format d1usea d2usea d3usea d4usea d5usea d6usea d7usea $X35979X.;
informat d1useb d2useb d3useb d4useb d5useb d6useb d7useb 3.;
format d1useb d2useb d3useb d4useb d5useb d6useb d7useb 3.;
datalines;
01 . . . 02/28/18 03/01/18 03/02/18 03/03/18 . . . 0 0 1 0 . . . . . 0 .
01 03/04/18 03/05/18 03/06/18 03/07/18 03/08/18 03/09/18 03/10/18 0 0 0 0 0 1 0 . . . . . 0 .
01 03/11/18 03/12/18 03/13/18 03/14/18 03/15/18 03/16/18 03/17/18 0 1 0 0 0 0 0 . 0 . . . . .
01 03/18/18 03/19/18 03/20/18 03/21/18 03/22/18 03/23/18 03/24/18 0 0 0 0 0 0 0 . . . . . . .
01 03/25/18 03/26/18 03/27/18 03/28/18 03/29/18 03/30/18 03/31/18 0 0 1 0 0 0 0 . . 0 . . . .
01 04/01/18 04/02/18 04/03/18 04/04/18 04/05/18 04/06/18 04/07/18 0 0 0 0 0 0 0 . . . . . . .
01 04/08/18 04/09/18 04/10/18 04/11/18 04/12/18 04/13/18 04/14/18 0 0 0 0 0 0 0 . . . . . . .
01 04/15/18 04/16/18 04/17/18 04/18/18 04/19/18 04/20/18 04/21/18 0 0 0 0 0 0 0 . . . . . . .
01 04/22/18 04/23/18 04/24/18 04/25/18 04/26/18 04/27/18 04/28/18 0 0 0 0 0 0 0 . . . . . . .
01 04/29/18 04/30/18 05/01/18 05/02/18 05/03/18 05/04/18 05/05/18 0 0 0 0 0 0 0 . . . . . . .
01 05/06/18 05/07/18 05/08/18 05/09/18 05/10/18 05/11/18 05/12/18 0 0 0 0 0 0 0 . . . . . . .
01 05/13/18 05/14/18 05/15/18 05/16/18 05/17/18 05/18/18 05/19/18 0 0 0 0 0 0 0 . . . . . . .
01 05/20/18 05/21/18 05/22/18 05/23/18 05/24/18 05/25/18 05/26/18 0 0 0 0 0 0 0 . . . . . . .
01 05/27/18 05/28/18 05/29/18 . . . . 0 0 0 . . . . . . . . . . .
02 . . . . . . 03/10/18 . . . . . . 0 . . . . . . .
02 03/11/18 03/12/18 03/13/18 03/14/18 03/15/18 03/16/18 03/17/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 03/18/18 03/19/18 03/20/18 03/21/18 03/22/18 03/23/18 03/24/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 03/25/18 03/26/18 03/27/18 03/28/18 03/29/18 03/30/18 03/31/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 04/01/18 04/02/18 04/03/18 04/04/18 04/05/18 04/06/18 04/07/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 04/08/18 04/09/18 04/10/18 04/11/18 04/12/18 04/13/18 04/14/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 04/15/18 04/16/18 04/17/18 04/18/18 04/19/18 04/20/18 04/21/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 04/22/18 04/23/18 04/24/18 04/25/18 04/26/18 04/27/18 04/28/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 04/29/18 04/30/18 05/01/18 05/02/18 05/03/18 05/04/18 05/05/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 05/06/18 05/07/18 05/08/18 05/09/18 . . . 1 1 1 1 . . . 1 1 1 1 . . .
;
data have2;
input
pid Screen_Date: mmddyy10. VisitNo Visit_date: mmddyy10.;
format screen_date visit_date mmddyy10.;
datalines;
01 03/30/2018 2 05/02/2018
01 03/30/2018 3 05/30/2018
02 04/09/2018 2 05/09/2018
02 04/09/2018 3 .
;
After you expand and reshape the two data sets into a 'vector' shape you will have a structure that is very amenable to a SQL query.
The example code has been expanded to deal with a usage table that collects more than one drug across the row.
HAVE1 - Visits
data visits(label="Visits with subject screening date");
input
Subject Screen_Date: mmddyy10. VisitNo Visit_date: mmddyy10.;
format screen_date visit_date mmddyy10.;
datalines;
01 03/30/2018 2 05/02/2018
01 03/30/2018 3 05/30/2018
02 04/09/2018 2 05/09/2018
02 04/09/2018 3 .
;
* expand by adding one row per subject, pivoting screen_date to 'date';
data visits_expanded(label="Visits with screening date as visit #1");
set visits;
by subject;
if first.subject then do;
date = screen_date;
_n_ = visitno; %* repurpose _n_ to hold first visitno;
visitno = 1; %* force visit #1;
OUTPUT; %* output visit #1;
visitno = _n_; %* restore original visitno;
end;
date = visit_date;
OUTPUT;
format date mmddyy10. visitno 2.;
keep subject date visitno;
run;
HAVE2 - Weekly drug use log
data usage_logs(label="Participant weekly usage log, Drugs A and B");
input
Subject DATE1-DATE7 DRUG_A1-DRUG_A7 DRUG_B1-DRUG_B7;
informat DATE1-DATE7 mmddyy10.;
format DATE1-DATE7 mmddyy10.;
format DRUG_A1-DRUG_A7 DRUG_B1-DRUG_B7 2.;
datalines;
01 . . . 02/28/18 03/01/18 03/02/18 03/03/18 . . . 0 0 1 0 . . . . . 0 .
01 03/04/18 03/05/18 03/06/18 03/07/18 03/08/18 03/09/18 03/10/18 0 0 0 0 0 1 0 . . . . . 0 .
01 03/11/18 03/12/18 03/13/18 03/14/18 03/15/18 03/16/18 03/17/18 0 1 0 0 0 0 0 . 0 . . . . .
01 03/18/18 03/19/18 03/20/18 03/21/18 03/22/18 03/23/18 03/24/18 0 0 0 0 0 0 0 . . . . . . .
01 03/25/18 03/26/18 03/27/18 03/28/18 03/29/18 03/30/18 03/31/18 0 0 1 0 0 0 0 . . 0 . . . .
01 04/01/18 04/02/18 04/03/18 04/04/18 04/05/18 04/06/18 04/07/18 0 0 0 0 0 0 0 . . . . . . .
01 04/08/18 04/09/18 04/10/18 04/11/18 04/12/18 04/13/18 04/14/18 0 0 0 0 0 0 0 . . . . . . .
01 04/15/18 04/16/18 04/17/18 04/18/18 04/19/18 04/20/18 04/21/18 0 0 0 0 0 0 0 . . . . . . .
01 04/22/18 04/23/18 04/24/18 04/25/18 04/26/18 04/27/18 04/28/18 0 0 0 0 0 0 0 . . . . . . .
01 04/29/18 04/30/18 05/01/18 05/02/18 05/03/18 05/04/18 05/05/18 0 0 0 0 0 0 0 . . . . . . .
01 05/06/18 05/07/18 05/08/18 05/09/18 05/10/18 05/11/18 05/12/18 0 0 0 0 0 0 0 . . . . . . .
01 05/13/18 05/14/18 05/15/18 05/16/18 05/17/18 05/18/18 05/19/18 0 0 0 0 0 0 0 . . . . . . .
01 05/20/18 05/21/18 05/22/18 05/23/18 05/24/18 05/25/18 05/26/18 0 0 0 0 0 0 0 . . . . . . .
01 05/27/18 05/28/18 05/29/18 . . . . 0 0 0 . . . . . . . . . . .
02 . . . . . . 03/10/18 . . . . . . 0 . . . . . . .
02 03/11/18 03/12/18 03/13/18 03/14/18 03/15/18 03/16/18 03/17/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 03/18/18 03/19/18 03/20/18 03/21/18 03/22/18 03/23/18 03/24/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 03/25/18 03/26/18 03/27/18 03/28/18 03/29/18 03/30/18 03/31/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 04/01/18 04/02/18 04/03/18 04/04/18 04/05/18 04/06/18 04/07/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 04/08/18 04/09/18 04/10/18 04/11/18 04/12/18 04/13/18 04/14/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 04/15/18 04/16/18 04/17/18 04/18/18 04/19/18 04/20/18 04/21/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 04/22/18 04/23/18 04/24/18 04/25/18 04/26/18 04/27/18 04/28/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 04/29/18 04/30/18 05/01/18 05/02/18 05/03/18 05/04/18 05/05/18 1 1 1 0 1 1 0 1 1 1 . 1 1 .
02 05/06/18 05/07/18 05/08/18 05/09/18 . . . 1 1 1 1 . . . 1 1 1 1 . . .
;
* transpose multiple column sets at a time using arrays;
* output as interleaved stacking;
* no checks for logging gaps;
data usage_dates (keep=subject date drug use_flag);
set usage_logs;
array DATES DATE1-DATE7;
array AUSES DRUG_A1-DRUG_A7;
array BUSES DRUG_B1-DRUG_B7;
do index = 1 to dim(DATES);
date = DATES(index);
if missing(date) then continue;
drug = 'A'; use_flag = AUSES(index); output; * this is the interleaving;
drug = 'B'; use_flag = BUSES(index); output; * right here;
end;
attrib
date format=mmddyy10. label='Date'
drug label='Drug'
use_flag label='Flag: Was drug used on date'
;
run;
When both data sets are tall and skinny SQL can be very effective and the processing easily understood.
This sample code show that there is one LEFT JOIN per DRUG for computing the associated use over date span
proc sql;
create table subject_visits_of_interest as
select
VIS.subject
, VIS.visitno
, VIS.date
, MAX(DRUG_A.use_flag) as Drug_A_Use_Flag label='Flag: Was drug A used since prior visit'
, MAX(DRUG_B.use_flag) as Drug_B_Use_Flag label='Flag: Was drug B used since prior visit'
from
visits_expanded as VIS
left join
visits_expanded as VIS_PRIOR
on
VIS.subject = VIS_PRIOR.subject and
VIS.visitno = VIS_PRIOR.visitno+1
left join
(select * from usage_dates where drug='A') as DRUG_A %* sub-select for DRUG A use;
on
VIS.subject = DRUG_A.subject and
VIS_PRIOR.date <= DRUG_A.date and DRUG_A.date < VIS.date
left join
(select * from usage_dates where drug='B') as DRUG_B %* sub-select for DRUG B use;
on
VIS.subject = DRUG_B.subject and
VIS_PRIOR.date <= DRUG_B.date and DRUG_B.date < VIS.date
group by
VIS.subject, VIS.visitno, VIS.date
order by
VIS.subject, VIS.visitno, VIS.date
;
quit;
For the case of more than two drugs in the use log, you can go more generic.
This approach requires a consistent metadata, i.e. variable naming conventions such that drug names and day numbers can be extracted from the variable names. There are more 'steps' than might be coded using a hash based approach, but perhaps the steps are more understandable to non-experts.
* PROC and DATA steps only, no HASH;
* view to provide unique log_rowid;
data usage_logs_v / view=usage_logs_v;
log_rowid + 1;
set usage_logs;
run;
* transpose 1 - dates across become dates down a tall thin shape;
* one row becomes 7 rows;
proc transpose data=usage_logs_v name=DateVarname out=use_dates_T(rename=(col1=date));
by log_rowid subject;
var date:; %* 7 dates;
run;
* transpose 2 - drugs flagging across becomes flagging down a taller thinner shape;
* one row becomes (7 * # of drugs) rows;
proc transpose data=usage_logs_v name=FlagVarname out=use_flags_T(rename=(col1=flag));
by log_rowid subject;
var drug:; %* 7 flags * # of drugs;
run;
* obtain day from last character of date variable name;
data use_dates(drop=DateVarname _label_);
set use_dates_T;
where not missing (date);
day = substr(DateVarname, length(DateVarname));
run;
* obtain drug name and day from flag variable name;
* presume the 'data' that was in the metadata (the original variable name)
* can be extracted due to consistent naming convention
* <flagname> is <drug-nameproxy><day#(1-7)>;
data use_flags(drop=FlagVarname _label_);
set use_flags_T;
where not missing(flag);
drug = substr(FlagVarname, 1, length(FlagVarname)-1);
day = substr(FlagVarname, length(FlagVarname));
run;
proc sql;
create table visit_each_drug_use as
select
VIS.subject
, VIS.visitno
, VIS.date
/* , VIS_PRIOR.date as span_bottom*/
/* , VIS.date as span_top*/
, USE_FLAGS.drug
/* , USE_FLAGS.flag*/
/* , USE_DATES.date as use_date*/
/* , USE_DATES.day*/
, MAX(USE_FLAGS.flag) as span_use_flag
from
visits_expanded as VIS
left join
visits_expanded as VIS_PRIOR
on
VIS.subject = VIS_PRIOR.subject and %* establish edges of date span between visits;
VIS.visitno = VIS_PRIOR.visitno+1
left join
use_dates
on
VIS.subject = USE_DATES.subject and
VIS_PRIOR.date <= USE_DATES.date and
USE_DATES.date < VIS.date %* restrict use flag data to those between visits;
left join
use_flags
on
USE_DATES.subject = USE_FLAGS.subject and
USE_DATES.log_rowid = USE_FLAGS.log_rowid and
USE_DATES.day = USE_FLAGS.day
group by
VIS.subject, VIS.visitno, VIS.date, USE_FLAGS.drug
/* order by*/
/* VIS.subject, VIS.date, USE_DATES.date, USE_FLAGS.drug*/
;
quit;
proc transpose data=visit_each_drug_use out=visit_drug_use(drop=_name_);
by subject visitno date;
id drug;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.