Assigning record status based on chronological order and test result

jmarnaez · Posted 11-06-2023 03:37 PM

Essentially we want to tag whether a case was an incident case or a prevalent case. At present, our understanding is that if a person has a confirmed test in 2019 and another in 2021 they would be an incident case for both years based on external guidance. However, a confirmed test in 2019 and a confirmed test in 2020 would only treat the 2019 one as incident and the 2020 as prevalent. Going by the sample data how could this be coded?

data WORK.SAMPLE;
  infile datalines dsd truncover;
  input ID:32. Type:$1. Elevated:32. Confirmed:32. Test_Year:32. Test_Date:32.;
  label ID="ID" Type="Type" Elevated="Elevated" Confirmed="Confirmed" Test_Year="Test Year" Test_Date="Test Date";
datalines;
1 C 1 1 2019 20191230
1 V 1 1 2020 20200130
1 C 0 0 2020 20200215
2 C 1 1 2019 20191115
2 C 1 1 2020 20200201
2 C 1 0 2020 20200215
2 V 0 0 2020 20200515
3 C 1 0 2020 20200101
4 V 1 1 2020 20200101
5 V 1 1 2019 20191230
5 C 1 1 2020 20200130
5 C 1 0 2020 20200228
5 C 0 0 2020 20200515
5 C 0 0 2021 20210401
5 V 1 1 2022 20220401
6 C 1 0 2019 20191230
6 C 0 0 2020 20200515
6 V 1 1 2020 20200615
6 C 0 0 2021 20210415
;;;;

ballardw · Posted 11-06-2023 03:48 PM

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

mkeintz · Posted 11-06-2023 11:50 PM

I understand your rule as:

Any confirmed test preceded by a two-year (or more) gap since the previous confirmed test is characterized as "incident"?
All confirmed tests preceded by less than a two year gap since previous confirmed are characterized as "prevalent"
Non-confirmed tests are characterized with a blank value.

Do you intend to use the test_year value, or the test_date value to determine gap size?

Here is sample code using test_year:

data want;
  set sample;
  by id;
  if confirmed=1 then do;
    if lag(id)=id and lag(test_year)>test_year-2 then result='Prevalent';
    else result='Incident';
  end;
run;

The point here is that the LAG function is not a "lookback" as in a spreadsheet cell referring to a prior row.

Instead, it is a fifo queue (in this case a short 1-entry fifo queue). That queue is only updated when confirmed=1. So the "lagged" value are only referring to earlier values in other observations with confirmed=1. Any intervening obs with confirmed=0 will have no impact.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

jmarnaez · Posted 11-07-2023 11:58 AM

Thank you for your response. This is definitely close. Looking over the output, is there a way to have it check the difference in year from the previous incident case in the ID? Basically if the difference in test_year for a confirmed test is less than two years from the most recent incident case, it's a prevalent case. If it's more than two years then it is an incident case. If not , I think that's a condition we can code alongside it with the incident/prevalent tag just checking if it's possible to get it all in one go.

mkeintz · Posted 11-08-2023 09:10 AM

@jmarnaez wrote:
Thank you for your response. This is definitely close. Looking over the output, is there a way to have it check the difference in year from the previous incident case in the ID? Basically if the difference in test_year for a confirmed test is less than two years from the most recent incident case, it's a prevalent case. If it's more than two years then it is an incident case. If not , I think that's a condition we can code alongside it with the incident/prevalent tag just checking if it's possible to get it all in one go.

I think I understand your refinement, but I am not sure. Please provide sample data, and expected response, that exposes the problem.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

jmarnaez · Posted 11-09-2023 12:22 PM

Sorry for the delay. This should illustrate it. Basically, the first and last records for ID 7 should be tagged as incident in the final product as an incident case can be enumerated every other year for the same person.

data WORK.SAMPLE2;
  infile datalines dsd truncover;
  input ID:32. Type:$1. Elevated:32. Confirmed:32. Test_Year:32. Test_Date:32.;
  label ID="ID" Type="Type" Elevated="Elevated" Confirmed="Confirmed" Test_Year="Test Year" Test_Date="Test Date";
datalines;
1 C 1 1 2019 20191230
1 V 1 1 2020 20200130
1 C 0 0 2020 20200215
2 C 1 1 2019 20191115
2 C 1 1 2020 20200201
2 C 1 0 2020 20200215
2 V 0 0 2020 20200515
3 C 1 0 2020 20200101
4 V 1 1 2020 20200101
5 V 1 1 2019 20191230
5 C 1 1 2020 20200130
5 C 1 0 2020 20200228
5 C 0 0 2020 20200515
5 C 0 0 2021 20210401
5 V 1 1 2022 20220401
6 C 1 0 2019 20191230
6 C 0 0 2020 20200515
6 V 1 1 2020 20200615
6 C 0 0 2021 20210415
7 V 1 1 2020 20200130
7 V 1 1 2021 20210215
7 V 1 1 2022 20220415
;;;;

mkeintz · Posted 11-13-2023 12:15 PM

@jmarnaez wrote:

Sorry for the delay. This should illustrate it. Basically, the first and last records for ID 7 should be tagged as incident in the final product as an incident case can be enumerated every other year for the same person.

So, even though there is a confirmed case in three successive years for ID 7, you don't want the middle year to reset the clock. In other words, you are not looking for two year gaps between confirmed cases, right?

But then, what are you looking for? Are you trying to identify every confirmed case at least two years after the first confirmed case (that would be one way to get the first an last ID 7 obs assigned as "incident")? I do not understand the conditions you are trying to detect.

Also, you have not specified whether you want to use TEST_YEAR or TEST_DATE as the chronological measure.

And finally, regarding your sample data.

Does your INPUT statement work for you? It doesn't for me (see a part of the log below):

19  data WORK.SAMPLE2;
420    infile datalines dsd truncover;
421    input ID:32. Type:$1. Elevated:32. Confirmed:32. Test_Year:32. Test_Date:32.;
422    label ID="ID" Type="Type" Elevated="Elevated" Confirmed="Confirmed" Test_Year="Test Year"
422! Test_Date="Test Date";
423  datalines;

NOTE: Invalid data for ID in line 424 1-21.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
424        1 C 1 1 2019 20191230
ID=. Type=  Elevated=. Confirmed=. Test_Year=. Test_Date=. _ERROR_=1 _N_=1

And why are you not reading in TEST_DATE with a date-related format?

This is why most of us ask for sample data in the form of a working data step. Why obligate multiple respondents do this extra work just to get a working sample? Please help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Assigning record status based on chronological order and test result

Re: Assigning record status based on chronological order and test result

Re: Assigning record status based on chronological order and test result

Re: Assigning record status based on chronological order and test result

Re: Assigning record status based on chronological order and test result

Re: Assigning record status based on chronological order and test result

Re: Assigning record status based on chronological order and test result

SAS Innovate 2025: Call for Content

Classroom Training Available!