Hello Community,
I am working on a data analysis problem for which I would greatly appreciate some programming help. Basically, I need to create a variable for each subject in my dataset that indicates whether (yes/no) they reported drug use prior to the study randomization date (i.e., baseline drug use). Below is what my dataset looks like as well as the desired output. Any feedback would be much appreciated! Please let me know if any additional information or clarification would be helpful.
Have:
ID | Randomization date | Week start date | D1 date | D2 date | D3 date | D4 date | D5 date | D6 date | D7 date | D1 drug use | D2 drug use | D3 drug use | D4 drug use | D5 drug use | D6 drug use | D7 drug use |
01 | 04/02/18 | 02/25/18 | . | . | . | 02/28/18 | 03/01/18 | 03/02/18 | 03/03/18 | . | . | . | 0 | 0 | 1 | 0 |
01 | 04/02/18 | 03/04/18 | 03/04/18 | 03/05/18 | 03/06/18 | 03/07/18 | 03/08/18 | 03/09/18 | 03/10/18 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
01 | 04/02/18 | 03/11/18 | 03/11/18 | 03/12/18 | 03/13/18 | 03/14/18 | 03/15/18 | 03/16/18 | 03/17/18 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
01 | 04/02/18 | 03/18/18 | 03/18/18 | 03/19/18 | 03/20/18 | 03/21/18 | 03/22/18 | 03/23/18 | 03/24/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 04/02/18 | 03/25/18 | 03/25/18 | 03/26/18 | 03/27/18 | 03/28/18 | 03/29/18 | 03/30/18 | 03/31/18 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
01 | 04/02/18 | 04/01/18 | 04/01/18 | 04/02/18 | 04/03/18 | 04/04/18 | 04/05/18 | 04/06/18 | 04/07/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 04/02/18 | 04/08/18 | 04/08/18 | 04/09/18 | 04/10/18 | 04/11/18 | 04/12/18 | 04/13/18 | 04/14/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
… |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note: The data includes 30 days prior to each subject’s randomization date; hence, the missing data for some cells. My dataset also includes 6 months of data after the randomization date; however, these data are irrelevant to my analysis question RE: baseline drug use.
Want:
ID | Baseline drug use |
01 | 1 |
02 | 0 |
03 | 1 |
… | … |
This is untested code. If you want code that I have tested, you need to provide sample data as a SAS data step.
data have2;
set have;
array drug_use d1_drug_use d2_drug_use ... ; /* You type the full list of variable names */
array dates d1_date d2_date ... ;
baseline_use_this_week=0;
do i=1 to dim(drug_use);
if drug_use(i)=1 and dates(i)<randomization_date then
baseline_use_this_week=1;
end;
run;
proc summary data=have2 nway;
class id;
var baseline_use_this_week;
output out=want max=baseline_drug_use;
run;
Basically, I need to create a variable for each subject in my dataset that indicates whether (yes/no) they reported drug use prior to the study randomization date (i.e., baseline drug use).
Is subject the variable named ID? How do we know from the table you show if (yes/no) they reported drug use? How do we know it was prior to the study randomization date? What is "baseline drug use"?
Hi @PaigeMiller Yes, the variable name "ID" indicates the subject.
Baseline drug use is defined by any ( Yes vs. No) drug before the randomization date.
Drug use is indicated by a "1" in the "D1....D7 drug use" columns. Each row of data for each participant represents an entire week. The "D1 drug use" column indicates drug use for the "D1 date" column, the "D2 drug use" column indicates drug use for the "D2 date" column, and so on.
To know whether drug use occurred before the randomization date, you then would check each date prior to the the date listed in for the "Randomization date" variable. In this case, the randomization date is 04/02/18. Therefore, I need to determine drug use on 04/01/18 and before. In this example, drug use occurred on 03/02/18, 03/09/18, 03/12/18, 03/28/18, and 03/30/18. So, the baseline drug use variable would be "1".
Please let me know if you have any other questions. Thank you for your help!
This is untested code. If you want code that I have tested, you need to provide sample data as a SAS data step.
data have2;
set have;
array drug_use d1_drug_use d2_drug_use ... ; /* You type the full list of variable names */
array dates d1_date d2_date ... ;
baseline_use_this_week=0;
do i=1 to dim(drug_use);
if drug_use(i)=1 and dates(i)<randomization_date then
baseline_use_this_week=1;
end;
run;
proc summary data=have2 nway;
class id;
var baseline_use_this_week;
output out=want max=baseline_drug_use;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.