BookmarkSubscribeRSS Feed
amanda_1
Calcite | Level 5

Hi all, 

I have a tricky (for me) programming question I am hoping you can all help with. 

 

I have a dataset of clusters (subjects) of longitudinal culture results. The number of available cultures per subject varies. For the particular bacteria we are tracking, the culture results are also variable by subject, with any of the following scenarios possible:

 

1. All negative cultures

2. Initially negative, then all positive

3. Initially negative, then positive, then all negative again

4. Negative, positive, negative, positive (etc. in this fashion, either ending positive or negative)

5. Initially positive, then all negative

6. Initially positive, then negative, then positive again

7. Same as #4, but starting out as positive

8. All positive cultures

 

I have a few objectives:

1. Identify those subjects with at least 1 positive culture and then all negative cultures

2. I will also want to identify those with more than 1 consecutive positive and those with more than 1 consecutive negative

3. Identify those subjects with at least 1 positive, then negative (and number of consecutive negatives) and then back to positive

4. Identify those subjects with only positive cultures

5. Identify those subjects with only negative cultures

 

This is a bit beyond my programming skill level, so I haven't attempted too many solutions other than a couple DO and DoW loops (and I'm not even getting close to what I need). 

 

I'm attaching a simulated dataset that includes examples of all 8 scenarios described above. Really, really appreciate any ideas/solutions.

4 REPLIES 4
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

amanda_1
Calcite | Level 5

Thanks so much for the heads up. Code below, let me know if any issues. 

 

data WORK.WANT;
  infile datalines dsd truncover;
  input ID:BEST. cx_date:MMDDYY10. cx_growth:$9.;
  format ID BEST. cx_date MMDDYY10.;
  label ID="ID" cx_date="cx_date" cx_growth="cx_growth";
datalines;
1 02/13/2009 No growth
1 05/20/2010 No growth
1 07/03/2012 No growth
1 04/18/2016 No growth
1 09/10/2016 No growth
2 01/13/2010 No growth
2 04/17/2012 Growth
2 07/16/2014 Growth
2 11/08/2015 Growth
3 05/15/2015 No growth
3 08/20/2018 Growth
3 03/01/2019 No growth
3 01/01/2020 No growth
4 03/17/2013 No growth
4 05/01/2015 Growth
4 01/15/2016 Growth
4 08/14/2019 No growth
4 10/13/2020 No growth
4 12/19/2020 No growth
4 02/04/2021 Growth
4 06/30/2021 No growth
4 10/31/2022 No growth
5 04/18/2016 Growth
5 02/14/2017 Growth
5 07/23/2018 No growth
5 09/07/2019 No growth
5 03/29/2020 No growth
5 05/18/2021 No growth
6 04/17/2012 Growth
6 12/13/2013 Growth
6 03/14/2015 No growth
6 08/11/2017 No growth
6 04/15/2018 Growth
6 08/18/2018 Growth
7 03/17/2013 Growth
7 05/01/2015 Growth
7 01/15/2016 Growth
7 08/14/2019 No growth
7 10/13/2020 No growth
7 12/19/2020 Growth
7 02/04/2021 Growth
7 06/30/2021 No growth
7 10/31/2022 No growth
8 02/13/2009 Growth
8 05/20/2010 Growth
8 07/03/2012 Growth
8 04/18/2016 Growth
8 09/10/2016 Growth
;;;;

 

ballardw
Super User

Now in terms of the variables shown you need to flesh out  description for how to identify

1. All negative cultures

2. Initially negative, then all positive

3. Initially negative, then positive, then all negative again

4. Negative, positive, negative, positive (etc. in this fashion, either ending positive or negative)

5. Initially positive, then all negative

6. Initially positive, then negative, then positive again

7. Same as #4, but starting out as positive

8. All positive cultures

 

You apparently would want to use the variable cx_growth. But we do not know which is positive, which is negative.

 

You may also want to show an example of what the result data would look like.

I suspect that in the long run it may be a good idea to create dichotomous numeric variable(s) for positive and negative as there are a number of things that are much easier to code with them than with character values.

You also do not have the same number of observations for each ID. If the number of observations might have an impact on the changing back and forth categories then any rules involving the actual count.

 

Does "initially" mean exactly one, a sequence of some specific length or an indefinite number? If these categories are supposed to be exclusive, as in one ID does not belong to 2 or more categories you will need some rules there as well.

Consider if "initially" includes an indefinite number then : positive, negative, positive would be both 6 and 7.

 

 

amanda_1
Calcite | Level 5

Thanks, ballardw, all good points. 

 

Here, negative=no growth= 0 (dichotomous version in updated coded below) and positive = growth = 1. You are correct, the number of observations are not the same for each ID and this is how the real data is as well. Also correct, when I say 'initially' this can refer to 1 or more cultures where the subject is at first negative and then cultures positive or vice versa.

 

As an end point, I think I am envisioning something along the lines of a wide dataset. Below, culture_convert is defined as any subject with at least one positive and all subsequent cultures (1 or more) are negative.

ID

all_positiveall_negativeatleast1positiveconsecutive_posatleast1negativeconsecutive_negculture_convert
10100110
20011100
30010111
40011111
50011111
60011110
70011111
81011000

 

 

data WORK.WANT;
  infile datalines dsd truncover;
  input ID:BEST. cx_date:MMDDYY10. cx_growth:BEST.;
  format ID BEST. cx_date MMDDYY10. cx_growth BEST.;
  label ID="ID" cx_date="cx_date" cx_growth="cx_growth";
datalines;
1 02/13/2009 0
1 05/20/2010 0
1 07/03/2012 0
1 04/18/2016 0
1 09/10/2016 0
2 01/13/2010 0
2 04/17/2012 1
2 07/16/2014 1
2 11/08/2015 1
3 05/15/2015 0
3 08/20/2018 1
3 03/01/2019 0
3 01/01/2020 0
4 03/17/2013 0
4 05/01/2015 1
4 01/15/2016 1
4 08/14/2019 0
4 10/13/2020 0
4 12/19/2020 0
4 02/04/2021 1
4 06/30/2021 0
4 10/31/2022 0
5 04/18/2016 1
5 02/14/2017 1
5 07/23/2018 0
5 09/07/2019 0
5 03/29/2020 0
5 05/18/2021 0
6 04/17/2012 1
6 12/13/2013 1
6 03/14/2015 0
6 08/11/2017 0
6 04/15/2018 1
6 08/18/2018 1
7 03/17/2013 1
7 05/01/2015 1
7 01/15/2016 1
7 08/14/2019 0
7 10/13/2020 0
7 12/19/2020 1
7 02/04/2021 1
7 06/30/2021 0
7 10/31/2022 0
8 02/13/2009 1
8 05/20/2010 1
8 07/03/2012 1
8 04/18/2016 1
8 09/10/2016 1
;;;;

 

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 802 views
  • 0 likes
  • 2 in conversation