Mixed or unmixed culture results by subject

amanda_1 · Posted 10-03-2023 02:38 PM

Hi all,

I have a tricky (for me) programming question I am hoping you can all help with.

I have a dataset of clusters (subjects) of longitudinal culture results. The number of available cultures per subject varies. For the particular bacteria we are tracking, the culture results are also variable by subject, with any of the following scenarios possible:

1. All negative cultures

2. Initially negative, then all positive

3. Initially negative, then positive, then all negative again

4. Negative, positive, negative, positive (etc. in this fashion, either ending positive or negative)

5. Initially positive, then all negative

6. Initially positive, then negative, then positive again

7. Same as #4, but starting out as positive

8. All positive cultures

I have a few objectives:

1. Identify those subjects with at least 1 positive culture and then all negative cultures

2. I will also want to identify those with more than 1 consecutive positive and those with more than 1 consecutive negative

3. Identify those subjects with at least 1 positive, then negative (and number of consecutive negatives) and then back to positive

4. Identify those subjects with only positive cultures

5. Identify those subjects with only negative cultures

This is a bit beyond my programming skill level, so I haven't attempted too many solutions other than a couple DO and DoW loops (and I'm not even getting close to what I need).

I'm attaching a simulated dataset that includes examples of all 8 scenarios described above. Really, really appreciate any ideas/solutions.

ballardw · Posted 10-03-2023 02:40 PM

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

amanda_1 · Posted 10-03-2023 02:51 PM

Thanks so much for the heads up. Code below, let me know if any issues.

data WORK.WANT;
  infile datalines dsd truncover;
  input ID:BEST. cx_date:MMDDYY10. cx_growth:$9.;
  format ID BEST. cx_date MMDDYY10.;
  label ID="ID" cx_date="cx_date" cx_growth="cx_growth";
datalines;
1 02/13/2009 No growth
1 05/20/2010 No growth
1 07/03/2012 No growth
1 04/18/2016 No growth
1 09/10/2016 No growth
2 01/13/2010 No growth
2 04/17/2012 Growth
2 07/16/2014 Growth
2 11/08/2015 Growth
3 05/15/2015 No growth
3 08/20/2018 Growth
3 03/01/2019 No growth
3 01/01/2020 No growth
4 03/17/2013 No growth
4 05/01/2015 Growth
4 01/15/2016 Growth
4 08/14/2019 No growth
4 10/13/2020 No growth
4 12/19/2020 No growth
4 02/04/2021 Growth
4 06/30/2021 No growth
4 10/31/2022 No growth
5 04/18/2016 Growth
5 02/14/2017 Growth
5 07/23/2018 No growth
5 09/07/2019 No growth
5 03/29/2020 No growth
5 05/18/2021 No growth
6 04/17/2012 Growth
6 12/13/2013 Growth
6 03/14/2015 No growth
6 08/11/2017 No growth
6 04/15/2018 Growth
6 08/18/2018 Growth
7 03/17/2013 Growth
7 05/01/2015 Growth
7 01/15/2016 Growth
7 08/14/2019 No growth
7 10/13/2020 No growth
7 12/19/2020 Growth
7 02/04/2021 Growth
7 06/30/2021 No growth
7 10/31/2022 No growth
8 02/13/2009 Growth
8 05/20/2010 Growth
8 07/03/2012 Growth
8 04/18/2016 Growth
8 09/10/2016 Growth
;;;;

ballardw · Posted 10-03-2023 03:19 PM

Now in terms of the variables shown you need to flesh out description for how to identify

1. All negative cultures

2. Initially negative, then all positive

3. Initially negative, then positive, then all negative again

4. Negative, positive, negative, positive (etc. in this fashion, either ending positive or negative)

5. Initially positive, then all negative

6. Initially positive, then negative, then positive again

7. Same as #4, but starting out as positive

8. All positive cultures

You apparently would want to use the variable cx_growth. But we do not know which is positive, which is negative.

You may also want to show an example of what the result data would look like.

I suspect that in the long run it may be a good idea to create dichotomous numeric variable(s) for positive and negative as there are a number of things that are much easier to code with them than with character values.

You also do not have the same number of observations for each ID. If the number of observations might have an impact on the changing back and forth categories then any rules involving the actual count.

Does "initially" mean exactly one, a sequence of some specific length or an indefinite number? If these categories are supposed to be exclusive, as in one ID does not belong to 2 or more categories you will need some rules there as well.

Consider if "initially" includes an indefinite number then : positive, negative, positive would be both 6 and 7.

amanda_1 · Posted 10-03-2023 05:45 PM

Thanks, ballardw, all good points.

Here, negative=no growth= 0 (dichotomous version in updated coded below) and positive = growth = 1. You are correct, the number of observations are not the same for each ID and this is how the real data is as well. Also correct, when I say 'initially' this can refer to 1 or more cultures where the subject is at first negative and then cultures positive or vice versa.

As an end point, I think I am envisioning something along the lines of a wide dataset. Below, culture_convert is defined as any subject with at least one positive and all subsequent cultures (1 or more) are negative.

ID	all_positive	all_negative	atleast1positive	consecutive_pos	atleast1negative	consecutive_neg	culture_convert
1	0	1	0	0	1	1	0
2	0	0	1	1	1	0	0
3	0	0	1	0	1	1	1
4	0	0	1	1	1	1	1
5	0	0	1	1	1	1	1
6	0	0	1	1	1	1	0
7	0	0	1	1	1	1	1
8	1	0	1	1	0	0	0

data WORK.WANT;
  infile datalines dsd truncover;
  input ID:BEST. cx_date:MMDDYY10. cx_growth:BEST.;
  format ID BEST. cx_date MMDDYY10. cx_growth BEST.;
  label ID="ID" cx_date="cx_date" cx_growth="cx_growth";
datalines;
1 02/13/2009 0
1 05/20/2010 0
1 07/03/2012 0
1 04/18/2016 0
1 09/10/2016 0
2 01/13/2010 0
2 04/17/2012 1
2 07/16/2014 1
2 11/08/2015 1
3 05/15/2015 0
3 08/20/2018 1
3 03/01/2019 0
3 01/01/2020 0
4 03/17/2013 0
4 05/01/2015 1
4 01/15/2016 1
4 08/14/2019 0
4 10/13/2020 0
4 12/19/2020 0
4 02/04/2021 1
4 06/30/2021 0
4 10/31/2022 0
5 04/18/2016 1
5 02/14/2017 1
5 07/23/2018 0
5 09/07/2019 0
5 03/29/2020 0
5 05/18/2021 0
6 04/17/2012 1
6 12/13/2013 1
6 03/14/2015 0
6 08/11/2017 0
6 04/15/2018 1
6 08/18/2018 1
7 03/17/2013 1
7 05/01/2015 1
7 01/15/2016 1
7 08/14/2019 0
7 10/13/2020 0
7 12/19/2020 1
7 02/04/2021 1
7 06/30/2021 0
7 10/31/2022 0
8 02/13/2009 1
8 05/20/2010 1
8 07/03/2012 1
8 04/18/2016 1
8 09/10/2016 1
;;;;

Mixed or unmixed culture results by subject

Re: Mixed or unmixed culture results by subject

Re: Mixed or unmixed culture results by subject

Re: Mixed or unmixed culture results by subject

Re: Mixed or unmixed culture results by subject

Mixed or unmixed culture results by subject

Re: Mixed or unmixed culture results by subject

Re: Mixed or unmixed culture results by subject

Re: Mixed or unmixed culture results by subject

Re: Mixed or unmixed culture results by subject

Registration is open

SAS Training: Just a Click Away