BookmarkSubscribeRSS Feed
cj3
Fluorite | Level 6 cj3
Fluorite | Level 6

Hello Community,

 

I am working on a data analysis problem for which I would greatly appreciate some programming help. Basically, I need to know whether or not subjects in my dataset reported drug use (Drug A) during a research study at specific time points. Specifically, I would like to know whether subjects reported use of Drug A during each of the following time periods during the study:

  • Prior to the study Screening date (dates<Screen_Date)
  • Between the Screening date and Visit 2 (Screen_Date<=dates<Visit 2)
  • Between Visit 2 and Visit 3 (Visit 2<=dates<Visit 3)

Please see an example of the desired output data set (Want) below. The VISITNO corresponds to the number of each of the 3 study visits where Visit 1 would correspond to the Screening date.

 

Want:

PID

VISITNO

Date

Drug A Use

01

1

03/30/2018

1

01

2

05/02/2018

0

01

3

05/30/2018

0

02

1

04/09/2018

1

02

2

05/09/2018

1

02

3

.

.

…..

 

 

 

 

I have also provided an example of the two data sets I have to produce this output. The first is a dataset listing the visit numbers and visit dates for each subject (Have 1). Note that the Screening Visit is a separate variable; however, I would like to have a single variable in my output dataset that designates the Screening Visit as Visit 1 (See Want above). Please note that some visits may not have attended a study visit, hence, the missing data for a visit date (subject 02 in the example below).  

 

Have 1:

Subject ID

Screen_Date

VisitNo

Visit_date

01

03/30/2018

2

05/02/2018

01

03/30/2018

3

05/30/2018

02

04/09/2018

2

05/09/2018

02

04/09/2018

3

.

 

The second dataset that I have indicates the dates that drug use occurred on (Have 2). That is, Drug A use is indicated by a "1" in the "D1....D7 drug A use" columns. Each row of data for each participant represents an entire week. The "D1 drug use" column indicates drug use for the "D1 date" column, the "D2 drug use" column indicates drug use for the "D2 date" column, and so on.

 

Please also note that some participants may not have completed a study visit. Therefore, the drug use for that visit would be missing (e.g., Subject ID 02 in the example below).

 

Have 2:

Subject ID

D1 date

D2 date

D3 date

D4 date

D5 date

D6 date

D7 date

D1 drug A use

D2 drug A use

D3 drug A use

D4 drug A use

D5 drug A use

D6 drug A use

D7 drug A use

01

.

.

.

02/28/18

03/01/18

03/02/18

03/03/18

.

.

.

0

0

1

0

01

03/04/18

03/05/18

03/06/18

03/07/18

03/08/18

03/09/18

03/10/18

0

0

0

0

0

1

0

01

03/11/18

03/12/18

03/13/18

03/14/18

03/15/18

03/16/18

03/17/18

0

1

0

0

0

0

0

01

03/18/18

03/19/18

03/20/18

03/21/18

03/22/18

03/23/18

03/24/18

0

0

0

0

0

0

0

01

03/25/18

03/26/18

03/27/18

03/28/18

03/29/18

03/30/18

03/31/18

0

0

1

0

0

0

0

01

04/01/18

04/02/18

04/03/18

04/04/18

04/05/18

04/06/18

04/07/18

0

0

0

0

0

0

0

01

04/08/18

04/09/18

04/10/18

04/11/18

04/12/18

04/13/18

04/14/18

0

0

0

0

0

0

0

01

04/15/18

04/16/18

04/17/18

04/18/18

04/19/18

04/20/18

04/21/18

0

0

0

0

0

0

0

01

04/22/18

04/23/18

04/24/18

04/25/18

04/26/18

04/27/18

04/28/18

0

0

0

0

0

0

0

01

04/29/18

04/30/18

05/01/18

05/02/18

05/03/18

05/04/18

05/05/18

0

0

0

0

0

0

0

01

05/06/18

05/07/18

05/08/18

05/09/18

05/10/18

05/11/18

05/12/18

0

0

0

0

0

0

0

01

05/13/18

05/14/18

05/15/18

05/16/18

05/17/18

05/18/18

05/19/18

0

0

0

0

0

0

0

01

05/20/18

05/21/18

05/22/18

05/23/18

05/24/18

05/25/18

05/26/18

0

0

0

0

0

0

0

01

05/27/18

05/28/18

05/29/18

.

.

.

.

0

0

0

.

.

.

.

02

.

.

.

.

.

.

03/10/18

.

.

.

.

.

.

0

02

03/11/18

03/12/18

03/13/18

03/14/18

03/15/18

03/16/18

03/17/18

1

1

1

0

1

1

0

02

03/18/18

03/19/18

03/20/18

03/21/18

03/22/18

03/23/18

03/24/18

1

1

1

0

1

1

0

02

03/25/18

03/26/18

03/27/18

03/28/18

03/29/18

03/30/18

03/31/18

1

1

1

0

1

1

0

02

04/01/18

04/02/18

04/03/18

04/04/18

04/05/18

04/06/18

04/07/18

1

1

1

0

1

1

0

02

04/08/18

04/09/18

04/10/18

04/11/18

04/12/18

04/13/18

04/14/18

1

1

1

0

1

1

0

02

04/15/18

04/16/18

04/17/18

04/18/18

04/19/18

04/20/18

04/21/18

1

1

1

0

1

1

0

02

04/22/18

04/23/18

04/24/18

04/25/18

04/26/18

04/27/18

04/28/18

1

1

1

0

1

1

0

02

04/29/18

04/30/18

05/01/18

05/02/18

05/03/18

05/04/18

05/05/18

1

1

1

0

1

1

0

02

05/06/18

05/07/18

05/08/18

05/09/18

.

.

.

1

1

1

1

.

.

.

…..

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Note: The data includes 30 days prior to each subject’s Screening Date; hence, the missing data for some cells.

 

7 REPLIES 7
mkeintz
PROC Star

Is screen date in HAVE1 the same date that is attributed to "visit 1" in dataset want?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Your HAVE2 data sample is a good example of why you will frequently see experienced users in sas community refer to the operational advantages of skinny datasets to wide datasets.  HAVE2 has 7 date variables (and 7 drug use dummies), and you need to compare EACH of those dates to the visit dates.  If, instead of 7 variables you have 7 rows, with 3 variables:     ID, DATE, DRUGA_USE dummy, your programming task would be far easier.

 

Similarly with HAVE1:  If you had a separate record for the screen_date, programming would be easier.

 

So here is a suggestion that prepares HAVE1 by creating a separate record with DATE=SCREEN_DATE, and also modifies HAVE2 to have one record per date in which druga use occurred.  Then the programming (see the 3rd step below) is almost trivial.

 

If you are able to provide your sample HAVE1 and HAVE2 in the form of a working DATA steps, I would be happy to test the program:

 

/* Make a dataset with one row for each date with actual drug use */
data druguse_dates (keep=pid date druga_use) / view=druguse_dates ;
  set have2;
  array dt {7} d1date d2date d3date d4date d5date d6date d7date;
  array du {7} d1usea d2usea d3usea d4usea d5usea d6usea d7usea;

  druga_use=1;
  format date date9. ;   /* Revised to include a format */
  do i=1 to 7;
    if du{i}=1 then do;
      date=dt{i};
      output;
    end;
  end;
run;


/* Make a dataset with one row per visit plus one row for screen_date (i.e. visitno=1) */
data visits (keep=pid date visitno);
  set have1;
  by pid;

  if first.pid do;  
    vno=visitno; /* Save the visit number */
    format date date9. ;   /* Revised to include a format */
    date=screen_date;
    visitno=1;   /*Presuming visit 1 is the same as the screen visit */
    output;
    visitno=vno; /* Restore the visit number */
  end;
  date=visit_date;
  if date=. then date='31dec2099'd+visit_no;  /* As a signal for later */
  output;
run;


/* Now output 1 record per visit */
data want;
   set visits (in=invisit)
       druguse_dates (drop=druga_use  in=use_date) ;
   by pid date;

   if invisit=1 then do;
     if date>='31dec2099'd then call missing(date,druga_use);
     output;
     druga_use=0; 
   end;
   if use_date=1 then set druguse_dates (keep=druga_use);
   if last.pid then druga_use=0;
run;

This program assumes both HAVE1 and HAVE2 are sorted by PID/DATE.

 

In the case of missing VISIT3 date (or any missing visit date), I set the DATE='31dec2099'd + visit_no.  This is so that the "by pid date" statement in the DATA WANT step does not find dates out of order, and stop the step.   Of course, that mean you have to test for those future dates and reset to missing prior to output.

 

The "trick" in the data WANT step is to note that the program only reads in the DRUGA_USE dummy (from dataset druguse_dates) when the "IF use_date=1 then set" condition is met.  So reading in the dummy variable is exactly synchronized with reading the rest of the variables from druguse_dates.  But unlike those other variables read in the first unconditional SET statement, the DRUGA_USE variable will be retained until reset to another value or re-read in the conditional set statement.

 

** Editted in response to @cj3 response:  I've inserted two format date date9.; statements.  Remember, it doesn't change the underlying value (the number of days after 01jan1960) - just the way the value is displayed.  For instance April 14, 2019 has the value 21653.  And that's how the date variables would be displayed, until I told SAS to use the format date9., forcing it to show as 14APR2019.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
cj3
Fluorite | Level 6 cj3
Fluorite | Level 6

Hi @mkeintz

 

Thank you so much for your time, this worked perfectly! I am still very much a SAS newbie, so I was wondering if you could also provide some feedback on how to slightly modify the above code to further suit my aims. 

 

1. First, could you please show me how the above code could be modified to efficiently examine use of additional drugs, such as adding "drugb_use", "drugc_use" variables in addition to "druga_use"?

2. Also, it appears that the code you provided put the dates in BEST12. format, although my input datasets have the dates in MMDDYY8. format. Any suggestions on how to get the date format back to MMDDYY8. would be appreciated as this format just seems to be a little more readable. 

 

Thank you!! 

cj3
Fluorite | Level 6 cj3
Fluorite | Level 6

@mkeintz also to follow up regarding adding additional drug use variables, my data set has it where a non-missing value (1/0) for the additional drug use variable (e.g., drugb_use) is only present for a date if there is a "1" for druga_use for that same date.  In other words, "drugb_use" would not be assessed if there was no "druga_use" for a particular date and therefore would show as missing. However, my desired output would indicate no "drugb_use" if there was also no "druga_use", as opposed to it showing as missing. Of course, if the visit date is missing, then druga_use and drubb_use would also be missing. I have provided my example dataset again below with this added as well as the desired dataset. Any suggestions on incorporating this to the earlier code would be greatly appreciated. 

 

Want:

PID

VISITNO

Date

Drug A Use

Drug B Use

01

1

03/30/2018

1

0

01

2

05/02/2018

0

0

01

3

05/30/2018

0

0

02

1

04/09/2018

1

1

02

2

05/09/2018

1

1

02

3

.

.

.

…..

 

 

 

 

 

Have:

Subject ID

D1 date

D2 date

D3 date

D4 date

D5 date

D6 date

D7 date

D1 drug A use

D2 drug A use

D3 drug A use

D4 drug A use

D5 drug A use

D6 drug A use

D7 drug A use

D1 drug B use

D2 drug B use

D3 drug B use

D4 drug B use

D5 drug B use

D6 drug B use

D7 drug B use

01

.

.

.

02/28/18

03/01/18

03/02/18

03/03/18

.

.

.

0

0

1

0

.

.

.

.

.

0

.

01

03/04/18

03/05/18

03/06/18

03/07/18

03/08/18

03/09/18

03/10/18

0

0

0

0

0

1

0

.

.

.

.

.

0

.

01

03/11/18

03/12/18

03/13/18

03/14/18

03/15/18

03/16/18

03/17/18

0

1

0

0

0

0

0

.

0

.

.

.

.

.

01

03/18/18

03/19/18

03/20/18

03/21/18

03/22/18

03/23/18

03/24/18

0

0

0

0

0

0

0

.

.

.

.

.

.

.

01

03/25/18

03/26/18

03/27/18

03/28/18

03/29/18

03/30/18

03/31/18

0

0

1

0

0

0

0

.

.

0

.

.

.

.

01

04/01/18

04/02/18

04/03/18

04/04/18

04/05/18

04/06/18

04/07/18

0

0

0

0

0

0

0

.

.

.

.

.

.

.

01

04/08/18

04/09/18

04/10/18

04/11/18

04/12/18

04/13/18

04/14/18

0

0

0

0

0

0

0

.

.

.

.

.

.

.

01

04/15/18

04/16/18

04/17/18

04/18/18

04/19/18

04/20/18

04/21/18

0

0

0

0

0

0

0

.

.

.

.

.

.

.

01

04/22/18

04/23/18

04/24/18

04/25/18

04/26/18

04/27/18

04/28/18

0

0

0

0

0

0

0

.

.

.

.

.

.

.

01

04/29/18

04/30/18

05/01/18

05/02/18

05/03/18

05/04/18

05/05/18

0

0

0

0

0

0

0

.

.

.

.

.

.

.

01

05/06/18

05/07/18

05/08/18

05/09/18

05/10/18

05/11/18

05/12/18

0

0

0

0

0

0

0

.

.

.

.

.

.

.

01

05/13/18

05/14/18

05/15/18

05/16/18

05/17/18

05/18/18

05/19/18

0

0

0

0

0

0

0

.

.

.

.

.

.

.

01

05/20/18

05/21/18

05/22/18

05/23/18

05/24/18

05/25/18

05/26/18

0

0

0

0

0

0

0

.

.

.

.

.

.

.

01

05/27/18

05/28/18

05/29/18

.

.

.

.

0

0

0

.

.

.

.

.

.

.

.

.

.

.

02

.

.

.

.

.

.

03/10/18

.

.

.

.

.

.

0

.

.

.

.

.

.

.

02

03/11/18

03/12/18

03/13/18

03/14/18

03/15/18

03/16/18

03/17/18

1

1

1

0

1

1

0

1

1

1

.

1

1

.

02

03/18/18

03/19/18

03/20/18

03/21/18

03/22/18

03/23/18

03/24/18

1

1

1

0

1

1

0

1

1

1

.

1

1

.

02

03/25/18

03/26/18

03/27/18

03/28/18

03/29/18

03/30/18

03/31/18

1

1

1

0

1

1

0

1

1

1

.

1

1

.

02

04/01/18

04/02/18

04/03/18

04/04/18

04/05/18

04/06/18

04/07/18

1

1

1

0

1

1

0

1

1

1

.

1

1

.

02

04/08/18

04/09/18

04/10/18

04/11/18

04/12/18

04/13/18

04/14/18

1

1

1

0

1

1

0

1

1

1

.

1

1

.

02

04/15/18

04/16/18

04/17/18

04/18/18

04/19/18

04/20/18

04/21/18

1

1

1

0

1

1

0

1

1

1

.

1

1

.

02

04/22/18

04/23/18

04/24/18

04/25/18

04/26/18

04/27/18

04/28/18

1

1

1

0

1

1

0

1

1

1

.

1

1

.

02

04/29/18

04/30/18

05/01/18

05/02/18

05/03/18

05/04/18

05/05/18

1

1

1

0

1

1

0

1

1

1

.

1

1

.

02

05/06/18

05/07/18

05/08/18

05/09/18

.

.

.

1

1

1

1

.

.

.

1

1

1

1

.

.

.

…..

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

mkeintz
PROC Star

If you have several drugs whose usage between visits you want to record, you could choose to make (from HAVE2) one dataset per drug with observations only for dates on which the corresponding drug was taken.  Then you can used each of these datasets (dates_drugusea dates_druguseb …) in the DATA WANT step:

 

Again, this program is untested, in the absence of a working DATA step with sample data:

 

/* Make a dataset, for each drug, with one row for each date of actual drug use */
data dates_drugeusea (keep=pid date druga_use)
     dates_druguseb (keep=pid date drugb_use) ;
  set have2;
  array dt {7} d1date d2date d3date d4date d5date d6date d7date;
  array da {7} d1usea d2usea d3usea d4usea d5usea d6usea d7usea;
  array db {7} d1useb d2useb d3useb d4useb d5useb d6useb d7useb;

  format date date9. ;   /* Revised to include a format */
  do i=1 to 7;
    druga_use=da{i};
    drugb_use=db{i};
    date=dt{i};
    if druga_use=1 then output dates_drugusea ;
    if drugb_use=1 then output dates_druguseb ;
  end;
run;

/* Make a dataset with one row per visit plus one row for screen_date (i.e. visitno=1) */
data visits (keep=pid date visitno);
  set have1;
  by pid;

  if first.pid do;
    vno=visitno; /* Save the visit number */
    format date date9. ;   /* Revised to include a format */
    date=screen_date;
    visitno=1;   /*Presuming visit 1 is the same as the screen visit */
    output;
    visitno=vno; /* Restore the visit number */
  end;
  date=visit_date;
  if date=. then date='31dec2099'd+visit_no;  /* As a signal for later */
  output;
run;

/* Now output 1 final record per visit, with preceding drug use (on or after prior visit) */
data want (drop=_:);
   set visits (in=invisit)
       dates_drugusea (drop=druga_use in=use_date_a)
       dates_druguseb (drop=druga_use in=use_date_b) ;
   by pid date;

   if invisit=1 then do;
     if date>='31dec2099'd then call missing(date,druga_use,drugb_use);
     output;
     druga_use=0; 
     drugb_use=0;
   end;
   if use_date_a=1 then set dates_drugusea (keep=druga_use);
   if use_date_b=1 then set dates_druguseb (keep=drugb_use);
   if last.pid then druga_use=0;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
cj3
Fluorite | Level 6 cj3
Fluorite | Level 6

@mkeintz Great, thank you! I tested the code and it works except that a missing value is showing up for drugb_use at Visit 1 for subject 1. See below:

Getting:

PID

VISITNO

Date

Druga_use

Drugb_use

01

1

03/30/2018

1

.

01

2

05/02/2018

0

0

01

3

05/30/2018

0

0

02

1

04/09/2018

1

1

02

2

05/09/2018

1

1

02

3

.

.

.

Instead, drugb_use at Visit 1 for subject 1 should be 0:

Need:

PID

VISITNO

Date

Druga_use

Drugb_use

01

1

03/30/2018

1

0

01

2

05/02/2018

0

0

01

3

05/30/2018

0

0

02

1

04/09/2018

1

1

02

2

05/09/2018

1

1

02

3

.

.

.

Can you please let me know how the code should be modified to fix this? I have provided the datasets below for your reference. 

 

data have1;
input 
pid	d1date d2date d3date d4date d5date d6date d7date
d1usea d2usea d3usea d4usea d5usea d6usea d7usea
d1useb d2useb d3useb d4useb d5useb d6useb d7useb;
informat d1date d2date d3date d4date d5date d6date d7date MMDDYY8.;
  format d1date d2date d3date d4date d5date d6date d7date MMDDYY8.;
informat d1usea d2usea d3usea d4usea d5usea d6usea d7usea $1.;
  format d1usea d2usea d3usea d4usea d5usea d6usea d7usea $X35979X.;
informat d1useb d2useb d3useb d4useb d5useb d6useb d7useb 3.;
  format d1useb d2useb d3useb d4useb d5useb d6useb d7useb 3.;
datalines;
01	.			.			.			02/28/18	03/01/18	03/02/18	03/03/18	.	.	.	0	0	1	0	.	.	.	.	.	0	.
01	03/04/18	03/05/18	03/06/18	03/07/18	03/08/18	03/09/18	03/10/18	0	0	0	0	0	1	0	.	.	.	.	.	0	.
01	03/11/18	03/12/18	03/13/18	03/14/18	03/15/18	03/16/18	03/17/18	0	1	0	0	0	0	0	.	0	.	.	.	.	.
01	03/18/18	03/19/18	03/20/18	03/21/18	03/22/18	03/23/18	03/24/18	0	0	0	0	0	0	0	.	.	.	.	.	.	.
01	03/25/18	03/26/18	03/27/18	03/28/18	03/29/18	03/30/18	03/31/18	0	0	1	0	0	0	0	.	.	0	.	.	.	.
01	04/01/18	04/02/18	04/03/18	04/04/18	04/05/18	04/06/18	04/07/18	0	0	0	0	0	0	0	.	.	.	.	.	.	.
01	04/08/18	04/09/18	04/10/18	04/11/18	04/12/18	04/13/18	04/14/18	0	0	0	0	0	0	0	.	.	.	.	.	.	.
01	04/15/18	04/16/18	04/17/18	04/18/18	04/19/18	04/20/18	04/21/18	0	0	0	0	0	0	0	.	.	.	.	.	.	.
01	04/22/18	04/23/18	04/24/18	04/25/18	04/26/18	04/27/18	04/28/18	0	0	0	0	0	0	0	.	.	.	.	.	.	.
01	04/29/18	04/30/18	05/01/18	05/02/18	05/03/18	05/04/18	05/05/18	0	0	0	0	0	0	0	.	.	.	.	.	.	.
01	05/06/18	05/07/18	05/08/18	05/09/18	05/10/18	05/11/18	05/12/18	0	0	0	0	0	0	0	.	.	.	.	.	.	.
01	05/13/18	05/14/18	05/15/18	05/16/18	05/17/18	05/18/18	05/19/18	0	0	0	0	0	0	0	.	.	.	.	.	.	.
01	05/20/18	05/21/18	05/22/18	05/23/18	05/24/18	05/25/18	05/26/18	0	0	0	0	0	0	0	.	.	.	.	.	.	.
01	05/27/18	05/28/18	05/29/18	.			.			.			.			0	0	0	.	.	.	.	.	.	.	.	.	.	.
02	.			.			.			.			.			.			03/10/18	.	.	.	.	.	.	0	.	.	.	.	.	.	.
02	03/11/18	03/12/18	03/13/18	03/14/18	03/15/18	03/16/18	03/17/18	1	1	1	0	1	1	0	1	1	1	.	1	1	.
02	03/18/18	03/19/18	03/20/18	03/21/18	03/22/18	03/23/18	03/24/18	1	1	1	0	1	1	0	1	1	1	.	1	1	.
02	03/25/18	03/26/18	03/27/18	03/28/18	03/29/18	03/30/18	03/31/18	1	1	1	0	1	1	0	1	1	1	.	1	1	.
02	04/01/18	04/02/18	04/03/18	04/04/18	04/05/18	04/06/18	04/07/18	1	1	1	0	1	1	0	1	1	1	.	1	1	.
02	04/08/18	04/09/18	04/10/18	04/11/18	04/12/18	04/13/18	04/14/18	1	1	1	0	1	1	0	1	1	1	.	1	1	.
02	04/15/18	04/16/18	04/17/18	04/18/18	04/19/18	04/20/18	04/21/18	1	1	1	0	1	1	0	1	1	1	.	1	1	.
02	04/22/18	04/23/18	04/24/18	04/25/18	04/26/18	04/27/18	04/28/18	1	1	1	0	1	1	0	1	1	1	.	1	1	.
02	04/29/18	04/30/18	05/01/18	05/02/18	05/03/18	05/04/18	05/05/18	1	1	1	0	1	1	0	1	1	1	.	1	1	.
02	05/06/18	05/07/18	05/08/18	05/09/18	.			.			.			1	1	1	1	.	.	.	1	1	1	1	.	.	.
;

data have2;
input
pid Screen_Date: mmddyy10.	VisitNo	Visit_date: mmddyy10.;
format screen_date visit_date mmddyy10.;
datalines;
01	03/30/2018	2	05/02/2018
01	03/30/2018	3	05/30/2018
02	04/09/2018	2	05/09/2018
02	04/09/2018	3	.
;

 

RichardDeVen
Barite | Level 11

After you expand and reshape the two data sets into a 'vector' shape you will have a structure that is very amenable to a SQL query.

 

The example code has been expanded to deal with a usage table that collects more than one drug across the row.

 

HAVE1 - Visits

data visits(label="Visits with subject screening date");
input
Subject Screen_Date: mmddyy10.	VisitNo	Visit_date: mmddyy10.;
format screen_date visit_date mmddyy10.;
datalines;
01	03/30/2018	2	05/02/2018
01	03/30/2018	3	05/30/2018
02	04/09/2018	2	05/09/2018
02	04/09/2018	3	.
;

* expand by adding one row per subject, pivoting screen_date to 'date'; data visits_expanded(label="Visits with screening date as visit #1"); set visits; by subject; if first.subject then do; date = screen_date; _n_ = visitno; %* repurpose _n_ to hold first visitno; visitno = 1; %* force visit #1; OUTPUT; %* output visit #1; visitno = _n_; %* restore original visitno; end; date = visit_date; OUTPUT; format date mmddyy10. visitno 2.; keep subject date visitno; run;

 HAVE2 - Weekly drug use log

data usage_logs(label="Participant weekly usage log, Drugs A and B");
input
Subject DATE1-DATE7 DRUG_A1-DRUG_A7 DRUG_B1-DRUG_B7;
informat DATE1-DATE7 mmddyy10.;
format DATE1-DATE7 mmddyy10.;
format DRUG_A1-DRUG_A7 DRUG_B1-DRUG_B7 2.;
datalines;
01 . . . 02/28/18 03/01/18 03/02/18 03/03/18 . . . 0 0 1 0 . . . . . 0 . 01 03/04/18 03/05/18 03/06/18 03/07/18 03/08/18 03/09/18 03/10/18 0 0 0 0 0 1 0 . . . . . 0 . 01 03/11/18 03/12/18 03/13/18 03/14/18 03/15/18 03/16/18 03/17/18 0 1 0 0 0 0 0 . 0 . . . . . 01 03/18/18 03/19/18 03/20/18 03/21/18 03/22/18 03/23/18 03/24/18 0 0 0 0 0 0 0 . . . . . . . 01 03/25/18 03/26/18 03/27/18 03/28/18 03/29/18 03/30/18 03/31/18 0 0 1 0 0 0 0 . . 0 . . . . 01 04/01/18 04/02/18 04/03/18 04/04/18 04/05/18 04/06/18 04/07/18 0 0 0 0 0 0 0 . . . . . . . 01 04/08/18 04/09/18 04/10/18 04/11/18 04/12/18 04/13/18 04/14/18 0 0 0 0 0 0 0 . . . . . . . 01 04/15/18 04/16/18 04/17/18 04/18/18 04/19/18 04/20/18 04/21/18 0 0 0 0 0 0 0 . . . . . . . 01 04/22/18 04/23/18 04/24/18 04/25/18 04/26/18 04/27/18 04/28/18 0 0 0 0 0 0 0 . . . . . . . 01 04/29/18 04/30/18 05/01/18 05/02/18 05/03/18 05/04/18 05/05/18 0 0 0 0 0 0 0 . . . . . . . 01 05/06/18 05/07/18 05/08/18 05/09/18 05/10/18 05/11/18 05/12/18 0 0 0 0 0 0 0 . . . . . . . 01 05/13/18 05/14/18 05/15/18 05/16/18 05/17/18 05/18/18 05/19/18 0 0 0 0 0 0 0 . . . . . . . 01 05/20/18 05/21/18 05/22/18 05/23/18 05/24/18 05/25/18 05/26/18 0 0 0 0 0 0 0 . . . . . . . 01 05/27/18 05/28/18 05/29/18 . . . . 0 0 0 . . . . . . . . . . . 02 . . . . . . 03/10/18 . . . . . . 0 . . . . . . . 02 03/11/18 03/12/18 03/13/18 03/14/18 03/15/18 03/16/18 03/17/18 1 1 1 0 1 1 0 1 1 1 . 1 1 . 02 03/18/18 03/19/18 03/20/18 03/21/18 03/22/18 03/23/18 03/24/18 1 1 1 0 1 1 0 1 1 1 . 1 1 . 02 03/25/18 03/26/18 03/27/18 03/28/18 03/29/18 03/30/18 03/31/18 1 1 1 0 1 1 0 1 1 1 . 1 1 . 02 04/01/18 04/02/18 04/03/18 04/04/18 04/05/18 04/06/18 04/07/18 1 1 1 0 1 1 0 1 1 1 . 1 1 . 02 04/08/18 04/09/18 04/10/18 04/11/18 04/12/18 04/13/18 04/14/18 1 1 1 0 1 1 0 1 1 1 . 1 1 . 02 04/15/18 04/16/18 04/17/18 04/18/18 04/19/18 04/20/18 04/21/18 1 1 1 0 1 1 0 1 1 1 . 1 1 . 02 04/22/18 04/23/18 04/24/18 04/25/18 04/26/18 04/27/18 04/28/18 1 1 1 0 1 1 0 1 1 1 . 1 1 . 02 04/29/18 04/30/18 05/01/18 05/02/18 05/03/18 05/04/18 05/05/18 1 1 1 0 1 1 0 1 1 1 . 1 1 . 02 05/06/18 05/07/18 05/08/18 05/09/18 . . . 1 1 1 1 . . . 1 1 1 1 . . . ; * transpose multiple column sets at a time using arrays; * output as interleaved stacking; * no checks for logging gaps; data usage_dates (keep=subject date drug use_flag); set usage_logs; array DATES DATE1-DATE7; array AUSES DRUG_A1-DRUG_A7; array BUSES DRUG_B1-DRUG_B7; do index = 1 to dim(DATES); date = DATES(index); if missing(date) then continue; drug = 'A'; use_flag = AUSES(index); output; * this is the interleaving; drug = 'B'; use_flag = BUSES(index); output; * right here; end; attrib date format=mmddyy10. label='Date' drug label='Drug' use_flag label='Flag: Was drug used on date' ; run;

When both data sets are tall and skinny SQL can be very effective and the processing easily understood.

This sample code show that there is one LEFT JOIN per DRUG for computing the associated use over date span

proc sql;
  create table subject_visits_of_interest as
  select 
    VIS.subject
    , VIS.visitno
    , VIS.date
    , MAX(DRUG_A.use_flag) as Drug_A_Use_Flag label='Flag: Was drug A used since prior visit'
    , MAX(DRUG_B.use_flag) as Drug_B_Use_Flag label='Flag: Was drug B used since prior visit'
  from 
    visits_expanded as VIS
  left join
    visits_expanded as VIS_PRIOR
  on
    VIS.subject = VIS_PRIOR.subject and
    VIS.visitno = VIS_PRIOR.visitno+1 
  left join
    (select * from usage_dates where drug='A') as DRUG_A  %* sub-select for DRUG A use;
  on
    VIS.subject = DRUG_A.subject and
    VIS_PRIOR.date <= DRUG_A.date and DRUG_A.date < VIS.date    
  left join
    (select * from usage_dates where drug='B') as DRUG_B  %* sub-select for DRUG B use;
  on
    VIS.subject = DRUG_B.subject and
    VIS_PRIOR.date <= DRUG_B.date and DRUG_B.date < VIS.date    
  group by
    VIS.subject, VIS.visitno, VIS.date
  order by
    VIS.subject, VIS.visitno, VIS.date
  ;
quit;

For the case of more than two drugs in the use log, you can go more generic.

  • Transpose use log dates by row, and
  • Transpose use log flags by row
  • Compute the drug name and day from the original log data variable names or labels
  • Join the tall skinny tables computing the 'any use over time span' by drug into a result set in which drug is a categorical variable
  • Transpose the result set into the final desired shape (or reporting shape)

This approach requires a consistent metadata, i.e. variable naming conventions such that drug names and day numbers can be extracted from the variable names.  There are more 'steps' than might be coded using a hash based approach, but perhaps the steps are more understandable to non-experts.

* PROC and DATA steps only, no HASH;

* view to provide unique log_rowid;
data usage_logs_v / view=usage_logs_v;
  log_rowid + 1;
  set usage_logs;
run;

* transpose 1 - dates across become dates down a tall thin shape;
* one row becomes 7 rows;
proc transpose data=usage_logs_v name=DateVarname out=use_dates_T(rename=(col1=date));
  by log_rowid subject; 
  var date:; %* 7 dates;
run;

* transpose 2 - drugs flagging across becomes flagging down a taller thinner shape;
* one row becomes (7 * # of drugs) rows;
proc transpose data=usage_logs_v name=FlagVarname out=use_flags_T(rename=(col1=flag));
  by log_rowid subject;
  var drug:;  %* 7 flags * # of drugs;
run;

* obtain day from last character of date variable name;
data use_dates(drop=DateVarname _label_);
  set use_dates_T;
  where not missing (date);
  day = substr(DateVarname, length(DateVarname));
run;

* obtain drug name and day from flag variable name;
* presume the 'data' that was in the metadata (the original variable name)
* can be extracted due to consistent naming convention
* <flagname> is <drug-nameproxy><day#(1-7)>;
data use_flags(drop=FlagVarname _label_);
  set use_flags_T;
  where not missing(flag);
  drug = substr(FlagVarname, 1, length(FlagVarname)-1);
  day  = substr(FlagVarname, length(FlagVarname));
run;


proc sql;
  create table visit_each_drug_use as
  select 
    VIS.subject
    , VIS.visitno
    , VIS.date
/*    , VIS_PRIOR.date as span_bottom*/
/*    , VIS.date as span_top*/
    , USE_FLAGS.drug
/*    , USE_FLAGS.flag*/
/*    , USE_DATES.date as use_date*/
/*    , USE_DATES.day*/
    , MAX(USE_FLAGS.flag) as span_use_flag
  from 
    visits_expanded as VIS
  left join
    visits_expanded as VIS_PRIOR
  on
    VIS.subject = VIS_PRIOR.subject and  %* establish edges of date span between visits;
    VIS.visitno = VIS_PRIOR.visitno+1 
  left join 
    use_dates 
  on
    VIS.subject = USE_DATES.subject and
    VIS_PRIOR.date <= USE_DATES.date and 
                      USE_DATES.date < VIS.date  %* restrict use flag data to those between visits;
  left join
    use_flags
  on 
    USE_DATES.subject = USE_FLAGS.subject and
    USE_DATES.log_rowid = USE_FLAGS.log_rowid and
    USE_DATES.day = USE_FLAGS.day
  group by
    VIS.subject, VIS.visitno, VIS.date, USE_FLAGS.drug
/*  order by*/
/*    VIS.subject, VIS.date, USE_DATES.date, USE_FLAGS.drug*/
  ;
quit;

proc transpose data=visit_each_drug_use out=visit_drug_use(drop=_name_);
  by subject visitno date;
  id drug;
run;

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 925 views
  • 2 likes
  • 3 in conversation