Hi all,
I'm working with claims data and there are repeated observations. I was interested in counting the frequency of arthritis in each individual. But I have 2 questions:
1. Since it's claims data, diagnosis that happened on the same day will be repeated in the data set, is there a way to ask SAS to keep only the first service date/observation if the diagnosis happened on the same day (svcdate) for each enrolid.
2. If I do solve question 1 above, and have distinct service dates with the 'diff' variable which is the difference of the svcdate and first_Svcdate. If the 'diff' is consecutive as it currently is for enrolid 3 - I want to count it as 1, whereas with enrolid 1 the first 3 observations I want to count as 1 and since there is a gap, I want to count the one with a diff of 5 as a second diagnosis.
I've shown what the data currently looks like and how I want it to look below.
Please I'd appreciate any help or guidance!
This is what the data looks like
data Trial;
input enrolid arthritis svcdate first_svcdate diff;
cards;
1 1 1/10/2018 1/10/2018 0
1 1 1/10/2018 1/10/2018 0
1 1 1/11/2018 1/10/2018 1
1 1 1/15/2018 1/10/2018 5
2 1 2/15/2018 2/15/2018 0
2 1 2/15/2018 2/15/2018 0
2 1 2/18/2019 2/15/2018 3
2 1 3/14/2019 2/15/2018 27
3 1 3/15/2012 3/15/2012 0
3 1 3/16/2012 3/15/2012 1
3 1 3/17/2012 3/15/2012 2
;
This is how I want it to look like.
data Trial;
input enrolid count ;
cards;
1 2
2 3
3 1
;
Hi @JME1
Here is a solution that works without the diff variable. The counter is set to 1 In the first observation for each enrolid, and in the following observations svcdate is compared to the previous svcdate, and 1 is added to the counter only if there is a gap. The result is written after the last record for each enrollid.
data Trial;
input enrolid arthritis svcdate mmddyy10. first_svcdate mmddyy10. diff;
format svcdate first_svcdate date9.;
cards;
1 1 1/10/2018 1/10/2018 0
1 1 1/10/2018 1/10/2018 0
1 1 1/11/2018 1/10/2018 1
1 1 1/15/2018 1/10/2018 5
2 1 2/15/2018 2/15/2018 0
2 1 2/15/2018 2/15/2018 0
2 1 2/18/2019 2/15/2018 3
2 1 3/14/2019 2/15/2018 27
3 1 3/15/2012 3/15/2012 0
3 1 3/16/2012 3/15/2012 1
3 1 3/17/2012 3/15/2012 2
;
run;
data t2 (keep=enrolid count); set trial (drop=diff);
by enrolid;
retain count;
ldate = lag(svcdate);
if first.enrolid then count = 1;
else if svcdate-ldate > 1 then count + 1;
if last.enrolid then output;
run;
Hi @JME1
Here is a solution that works without the diff variable. The counter is set to 1 In the first observation for each enrolid, and in the following observations svcdate is compared to the previous svcdate, and 1 is added to the counter only if there is a gap. The result is written after the last record for each enrollid.
data Trial;
input enrolid arthritis svcdate mmddyy10. first_svcdate mmddyy10. diff;
format svcdate first_svcdate date9.;
cards;
1 1 1/10/2018 1/10/2018 0
1 1 1/10/2018 1/10/2018 0
1 1 1/11/2018 1/10/2018 1
1 1 1/15/2018 1/10/2018 5
2 1 2/15/2018 2/15/2018 0
2 1 2/15/2018 2/15/2018 0
2 1 2/18/2019 2/15/2018 3
2 1 3/14/2019 2/15/2018 27
3 1 3/15/2012 3/15/2012 0
3 1 3/16/2012 3/15/2012 1
3 1 3/17/2012 3/15/2012 2
;
run;
data t2 (keep=enrolid count); set trial (drop=diff);
by enrolid;
retain count;
ldate = lag(svcdate);
if first.enrolid then count = 1;
else if svcdate-ldate > 1 then count + 1;
if last.enrolid then output;
run;
Thank you so much @ErikLund_Jensen ! It worked perfectly.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.