Essentially we want to tag whether a case was an incident case or a prevalent case. At present, our understanding is that if a person has a confirmed test in 2019 and another in 2021 they would be an incident case for both years based on external guidance. However, a confirmed test in 2019 and a confirmed test in 2020 would only treat the 2019 one as incident and the 2020 as prevalent. Going by the sample data how could this be coded?
data WORK.SAMPLE;
infile datalines dsd truncover;
input ID:32. Type:$1. Elevated:32. Confirmed:32. Test_Year:32. Test_Date:32.;
label ID="ID" Type="Type" Elevated="Elevated" Confirmed="Confirmed" Test_Year="Test Year" Test_Date="Test Date";
datalines;
1 C 1 1 2019 20191230
1 V 1 1 2020 20200130
1 C 0 0 2020 20200215
2 C 1 1 2019 20191115
2 C 1 1 2020 20200201
2 C 1 0 2020 20200215
2 V 0 0 2020 20200515
3 C 1 0 2020 20200101
4 V 1 1 2020 20200101
5 V 1 1 2019 20191230
5 C 1 1 2020 20200130
5 C 1 0 2020 20200228
5 C 0 0 2020 20200515
5 C 0 0 2021 20210401
5 V 1 1 2022 20220401
6 C 1 0 2019 20191230
6 C 0 0 2020 20200515
6 V 1 1 2020 20200615
6 C 0 0 2021 20210415
;;;;
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
I understand your rule as:
Do you intend to use the test_year value, or the test_date value to determine gap size?
Here is sample code using test_year:
data want;
set sample;
by id;
if confirmed=1 then do;
if lag(id)=id and lag(test_year)>test_year-2 then result='Prevalent';
else result='Incident';
end;
run;
The point here is that the LAG function is not a "lookback" as in a spreadsheet cell referring to a prior row.
Instead, it is a fifo queue (in this case a short 1-entry fifo queue). That queue is only updated when confirmed=1. So the "lagged" value are only referring to earlier values in other observations with confirmed=1. Any intervening obs with confirmed=0 will have no impact.
@jmarnaez wrote:
Thank you for your response. This is definitely close. Looking over the output, is there a way to have it check the difference in year from the previous incident case in the ID? Basically if the difference in test_year for a confirmed test is less than two years from the most recent incident case, it's a prevalent case. If it's more than two years then it is an incident case. If not , I think that's a condition we can code alongside it with the incident/prevalent tag just checking if it's possible to get it all in one go.
I think I understand your refinement, but I am not sure. Please provide sample data, and expected response, that exposes the problem.
Sorry for the delay. This should illustrate it. Basically, the first and last records for ID 7 should be tagged as incident in the final product as an incident case can be enumerated every other year for the same person.
data WORK.SAMPLE2;
infile datalines dsd truncover;
input ID:32. Type:$1. Elevated:32. Confirmed:32. Test_Year:32. Test_Date:32.;
label ID="ID" Type="Type" Elevated="Elevated" Confirmed="Confirmed" Test_Year="Test Year" Test_Date="Test Date";
datalines;
1 C 1 1 2019 20191230
1 V 1 1 2020 20200130
1 C 0 0 2020 20200215
2 C 1 1 2019 20191115
2 C 1 1 2020 20200201
2 C 1 0 2020 20200215
2 V 0 0 2020 20200515
3 C 1 0 2020 20200101
4 V 1 1 2020 20200101
5 V 1 1 2019 20191230
5 C 1 1 2020 20200130
5 C 1 0 2020 20200228
5 C 0 0 2020 20200515
5 C 0 0 2021 20210401
5 V 1 1 2022 20220401
6 C 1 0 2019 20191230
6 C 0 0 2020 20200515
6 V 1 1 2020 20200615
6 C 0 0 2021 20210415
7 V 1 1 2020 20200130
7 V 1 1 2021 20210215
7 V 1 1 2022 20220415
;;;;
@jmarnaez wrote:
Sorry for the delay. This should illustrate it. Basically, the first and last records for ID 7 should be tagged as incident in the final product as an incident case can be enumerated every other year for the same person.
So, even though there is a confirmed case in three successive years for ID 7, you don't want the middle year to reset the clock. In other words, you are not looking for two year gaps between confirmed cases, right?
But then, what are you looking for? Are you trying to identify every confirmed case at least two years after the first confirmed case (that would be one way to get the first an last ID 7 obs assigned as "incident")? I do not understand the conditions you are trying to detect.
Also, you have not specified whether you want to use TEST_YEAR or TEST_DATE as the chronological measure.
And finally, regarding your sample data.
19 data WORK.SAMPLE2;
420 infile datalines dsd truncover;
421 input ID:32. Type:$1. Elevated:32. Confirmed:32. Test_Year:32. Test_Date:32.;
422 label ID="ID" Type="Type" Elevated="Elevated" Confirmed="Confirmed" Test_Year="Test Year"
422! Test_Date="Test Date";
423 datalines;
NOTE: Invalid data for ID in line 424 1-21.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
424 1 C 1 1 2019 20191230
ID=. Type= Elevated=. Confirmed=. Test_Year=. Test_Date=. _ERROR_=1 _N_=1
This is why most of us ask for sample data in the form of a working data step. Why obligate multiple respondents do this extra work just to get a working sample? Please help us help you.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.