- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Community,
I am working on a data analysis problem for which I would greatly appreciate some programming help. Basically, I need to create a variable for each subject in my dataset that indicates whether (yes/no) they reported drug use prior to the study randomization date (i.e., baseline drug use). Below is what my dataset looks like as well as the desired output. Any feedback would be much appreciated! Please let me know if any additional information or clarification would be helpful.
Have:
ID | Randomization date | Week start date | D1 date | D2 date | D3 date | D4 date | D5 date | D6 date | D7 date | D1 drug use | D2 drug use | D3 drug use | D4 drug use | D5 drug use | D6 drug use | D7 drug use |
01 | 04/02/18 | 02/25/18 | . | . | . | 02/28/18 | 03/01/18 | 03/02/18 | 03/03/18 | . | . | . | 0 | 0 | 1 | 0 |
01 | 04/02/18 | 03/04/18 | 03/04/18 | 03/05/18 | 03/06/18 | 03/07/18 | 03/08/18 | 03/09/18 | 03/10/18 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
01 | 04/02/18 | 03/11/18 | 03/11/18 | 03/12/18 | 03/13/18 | 03/14/18 | 03/15/18 | 03/16/18 | 03/17/18 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
01 | 04/02/18 | 03/18/18 | 03/18/18 | 03/19/18 | 03/20/18 | 03/21/18 | 03/22/18 | 03/23/18 | 03/24/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 04/02/18 | 03/25/18 | 03/25/18 | 03/26/18 | 03/27/18 | 03/28/18 | 03/29/18 | 03/30/18 | 03/31/18 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
01 | 04/02/18 | 04/01/18 | 04/01/18 | 04/02/18 | 04/03/18 | 04/04/18 | 04/05/18 | 04/06/18 | 04/07/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
01 | 04/02/18 | 04/08/18 | 04/08/18 | 04/09/18 | 04/10/18 | 04/11/18 | 04/12/18 | 04/13/18 | 04/14/18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
… |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note: The data includes 30 days prior to each subject’s randomization date; hence, the missing data for some cells. My dataset also includes 6 months of data after the randomization date; however, these data are irrelevant to my analysis question RE: baseline drug use.
Want:
ID | Baseline drug use |
01 | 1 |
02 | 0 |
03 | 1 |
… | … |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is untested code. If you want code that I have tested, you need to provide sample data as a SAS data step.
data have2;
set have;
array drug_use d1_drug_use d2_drug_use ... ; /* You type the full list of variable names */
array dates d1_date d2_date ... ;
baseline_use_this_week=0;
do i=1 to dim(drug_use);
if drug_use(i)=1 and dates(i)<randomization_date then
baseline_use_this_week=1;
end;
run;
proc summary data=have2 nway;
class id;
var baseline_use_this_week;
output out=want max=baseline_drug_use;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Basically, I need to create a variable for each subject in my dataset that indicates whether (yes/no) they reported drug use prior to the study randomization date (i.e., baseline drug use).
Is subject the variable named ID? How do we know from the table you show if (yes/no) they reported drug use? How do we know it was prior to the study randomization date? What is "baseline drug use"?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @PaigeMiller Yes, the variable name "ID" indicates the subject.
Baseline drug use is defined by any ( Yes vs. No) drug before the randomization date.
Drug use is indicated by a "1" in the "D1....D7 drug use" columns. Each row of data for each participant represents an entire week. The "D1 drug use" column indicates drug use for the "D1 date" column, the "D2 drug use" column indicates drug use for the "D2 date" column, and so on.
To know whether drug use occurred before the randomization date, you then would check each date prior to the the date listed in for the "Randomization date" variable. In this case, the randomization date is 04/02/18. Therefore, I need to determine drug use on 04/01/18 and before. In this example, drug use occurred on 03/02/18, 03/09/18, 03/12/18, 03/28/18, and 03/30/18. So, the baseline drug use variable would be "1".
Please let me know if you have any other questions. Thank you for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is untested code. If you want code that I have tested, you need to provide sample data as a SAS data step.
data have2;
set have;
array drug_use d1_drug_use d2_drug_use ... ; /* You type the full list of variable names */
array dates d1_date d2_date ... ;
baseline_use_this_week=0;
do i=1 to dim(drug_use);
if drug_use(i)=1 and dates(i)<randomization_date then
baseline_use_this_week=1;
end;
run;
proc summary data=have2 nway;
class id;
var baseline_use_this_week;
output out=want max=baseline_drug_use;
run;
Paige Miller