Hi group!
I need help with creating a new categorical variable using 5 different variables. I would really appreciate any help with this 🙂
Thank you so much!
The five variables are Pt_ID, Test_ID, Collection_Date (DateTime variable), BloodCulture (yes=1) and Contaminant (yes=1, no =0). Each patient (Pt_ID) can have a single test (Test_ID) or multiple tests. Time when they are done is denoted by the Collection_Date and result whether it is Contaminant or not is denoted by 1 for yes and 0 for no.
This data set is a list of all patients with positive BloodCultures denoted by BloodCulture =1. All patients are tagged CulturePostive = 1 if BloodCulture = 1 and Contaminant = 0.
In instances where a single patient has two or more contaminants within 72 hours of each other I would like to tag those tests as CulturePostive = 1.
If there is only one Contaminant per Pt_ID, then that particular test is CulturePostive = 0.
If there are two tests for a patient, one is contaminant and other is not a contaminant (Contaminant = 0) and these two tests are within 72 hours of each other, then CulturePositive is 0 for the Contaminant = 1 and CulturePositive = 1 for Contaminant = 0. If there are two tests for a patient, one is contaminant and other is not a contaminant and the tests are >72 hours apart then CulturePositive is 0 for the contaminant = 1 and CulturePositive is 1 for Contaminant = 0.
Some specific instances:
3 or more tests within 72 hours and mixed results? Mixed contaminants?
- As long as there are two contaminants within 72 hours of each other, then those two contaminants are CulturePositive =1. If there are 3 contaminants within 72 hours of each other, all three are CulturePositive =1.
What if there are 3 or more tests and the first two are within 72 hours, and the third is within 72 hours of the second but not the first? Of the fourth is within 72 hours of the third but not the first and second?
- First and second are compared, if both are contaminants and within 72 hours, then both are tagged as CulturePositive =1.
Second and third are compared, if both are contaminants and within 72 hours, then both are tagged as CulturePositive =1.
Third and fourth are compared, if both are contaminants and within 72 hours, then both are tagged as CulturePositive =1.
My ultimate goal is to tag bloodcultures as CulturePositive =1 if BloodCulture = 1 and Contaminant = 0 and in instances where a single patient has two or more contaminants within 72 hours of each other I would like to tag those tests as CulturePostive = 1.
Any other scenario of BloodCulture = 1 and Contaminant = 1 is CulturePositive =0 (i.e. single Contaminant = 1 per patient or more than one Contaminant = 1 per patient that are > 72 hours of each other).
Please see below for the example of the data set and the variable CulturePostive that I need.
Pt_ID variable can be repeated, i.e. there can be patients with same Pt_ID in multiple rows indicating multiple tests. Test_ID variable is always unique and has different Test_ID number even if the collection_date is exactly same.
Dataset:
Pt_ID | Test_ID | Collection_Date | BloodCulture | Contaminant | CulturePositive |
1 | 170080299 | 17JAN2017:14:28:00 | 1 | 0 | 1 |
2 | 170080113 | 18JAN2017:14:28:00 | 1 | 1 | 1 |
2 | 170080114 | 18JAN2017:14:28:00 | 1 | 1 | 1 |
3 | 190080113 | 19JAN2017:11:28:00 | 1 | 0 | 1 |
3 | 190080114 | 19JAN2017:11:28:00 | 1 | 0 | 1 |
4 | 180080113 | 20JAN2017:14:28:00 | 1 | 1 | 1 |
4 | 180080114 | 20JAN2017:16:35:00 | 1 | 1 | 1 |
5 | 180080555 | 30JAN2017:14:53:00 | 1 | 1 | 0 |
5 | 180080557 | 05FEB2017:18:55:00 | 1 | 1 | 0 |
6 | 220080113 | 31JAN2017:11:28:00 | 1 | 0 | 1 |
6 | 220080114 | 08FEB2017:22:28:00 | 1 | 0 | 1 |
7 | 250080119 | 06FEB2017:22:43:00 | 1 | 1 | 0 |
7 | 260080234 | 06FEB2017:23:17:00 | 1 | 0 | 1 |
8 | 277000112 | 07FEB2017:23:17:00 | 1 | 1 | 0 |
|
|
|
|
|
|
9 | 290080119 | 06FEB2017:22:43:00 | 1 | 1 | 1 |
9 | 290080234 | 06FEB2017:23:17:00 | 1 | 0 | 1 |
9 | 297000112 | 07FEB2017:23:17:00 | 1 | 1 | 1 |
|
|
|
|
|
|
10 | 300080119 | 27FEB2017:22:43:00 | 1 | 0 | 1 |
10 | 300080234 | 27FEB2017:23:17:00 | 1 | 1 | 1 |
10 | 307000112 | 28FEB2017:23:17:00 | 1 | 1 | 1 |
|
|
|
|
|
|
11 | 310080234 | 06MAR2017:23:17:00 | 1 | 1 | 1 |
11 | 317000112 | 09 MAR 2017:23:15:00 | 1 | 1 | 1 |
11 | 310080119 | 12 MAR 2017:22:43:00 | 1 | 1 | 1 |
11 | 310080234 | 15 MAR 2017:22:17:00 | 1 | 1 | 1 |
For PT_ID 1, there is only one test and Contaminant = 0, so CulturePostive = 1;
For PT_ID 2, there are two tests done exactly at the same time and they both are contaminants, so I need to tag both of them as CulturePostive = 1;
For PT_ID 3, there are two tests done at same time but both are Contaminant = 0, so both are CulturePostive = 1;
For PT_ID 4, there are two tests done at different times but within 72 hours of each other, and both are Contaminant = 1, so I need to tag both of them as CulturePostive = 1;
For PT_ID 5, there are two tests done at different times > 72 hours of each other, and both are Contaminant = 1, but I need to tag both of them as CulturePostive = 0;
For PT_ID 6, there are two tests done at different times > 72 hours of each other, and both are Contaminant = 0, so I need to tag both of them as CulturePostive = 1;
For PT_ID 7, there are two tests done at different times < 72 hours of each other, first test is Contaminant = 0, so I need to tag it as CulturePostive = 0 and second one is contaminant = 0 so I need to tag the second test as CulturePostive = 1;
For PT_ID 8, there is only one test and Contaminant = 1, so CulturePostive = 0;
Please provide generic rules before the examples.
Just examples does not provide sufficient information on what is needed except in those specific cases.
Such as what happens with one contaminant and one not? 3 or more tests within 72 hours and mixed results? Mixed contaminants?
What if there are 3 or more tests and the first two are within 72 hours, and the third is within 72 hours of the second but not the first? Of the fourth is within 72 hours of the third but not the first and second?
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Hi there! I have now modified my post to make it more clear for you. Appreciate your help!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.