04-24-2018 02:20 PM
I'm testing a case definition that's looking for patients who have at least 3 separate healthcare visits during the year, and all 3 visits have to be at least 30 days apart. Each visit is stored in a separate record. So a person who had 6 visits during the year would have 6 records in the file.
I tested a similar definition, but it only required 2 visits. I accomplished that by doing a PROC SQL join that merged the data file with itself. This created a many-to-many merge. Then it was just a matter of finding all records where the difference in dates was 30 days or more, flagging them, and selecting all people who had at least one flag. But when I tried this approach with 3 dates, the file created by the many-to-many-to-many merge was too big and took too long to create to be practical.
I should also note that I need to calculate true positives, false positives, true negatives, false negatives, sensitivity, and specificity. So I can't simply drop cases that don't fit the criteria, because those cases will need to be classified later. Instead, all cases have to be flagged to indicate if they qualify for the definition or not.
Here is the code I used for the 2-visit version:
/*Create A and B versions of claim file, use PROC SQL to create a many-to-many merge. This will generate a file with all possible combinations of claims dates.*/ PROC SQL; Create Table temp.SCA_ICD10_def&defnumb._mm as Select a.encrypted_id, a.Med_def1516171920_count, a.DxSCA1520_max as DxSCA1520_max_A, b.DxSCA1520_max as DxSCA1520_max_B, a.CLAIM_SERVICE_FROM_DATE as CLAIM_DATE_A, b.CLAIM_SERVICE_FROM_DATE as CLAIM_DATE_B From temp.SCA_ICD10_def&defnumb._line_rec as a JOIN temp.SCA_ICD10_def&defnumb._line_rec as b ON a.encrypted_id = b.encrypted_id order by a.encrypted_id, a.CLAIM_SERVICE_FROM_DATE; QUIT; /****Apply Case Definition &defnumb. - At least 2 visits at least 30 days apart****/ DATA temp.SCA_ICD10_def&defnumb._dates; SET temp.SCA_ICD10_def&defnumb._mm; /*Convert claim dates to raw numbers so number of days will calculate correctly even across months*/ CLAIM_DATE_A_n=input(put(CLAIM_DATE_A,yymmddn8.), 8.); CLAIM_DATE_B_n=input(put(CLAIM_DATE_B,yymmddn8.), 8.); CLAIM_DATE_A_n2=input(put(CLAIM_DATE_A_n,$8.),yymmdd8.); CLAIM_DATE_B_n2=input(put(CLAIM_DATE_B_n,$8.),yymmdd8.); CLAIM_DATE_diff = CLAIM_DATE_A_n2 - CLAIM_DATE_B_n2; IF (CLAIM_DATE_diff => 30 AND Med_def1516171920_count => 2 AND DxSCA1520_max_A = 1 AND DxSCA1520_max_B = 1) THEN def&defnumb._Med_flag = 1; ELSE def&defnumb._Med_flag = 0; RUN; PROC SQL; Create table temp.SCA_ICD10_def&defnumb._uniqID as Select encrypted_id, max(def&defnumb._Med_flag) as def&defnumb._Med_case From temp.SCA_ICD10_def&defnumb._dates Group by encrypted_id; QUIT;
04-24-2018 04:14 PM
A brief example input data set, preferably in the form of a data step so we can generate data and test code against it, and the desired output for that code would be enlightening.
Clarification is needed because of the possible interpretations of "at least 30 days apart". If I have 6 visits and they are all at a 20 days interval would the 1st, 3rd and 5th records match your requirement (the would each be 40 days apart) or do you require consideration of sequential visits? If date1 and date3 are 30 days apart and date1 and date4 are 30 days apart but date3 is not 30 days from date4 does the set date1, date3 and date4 meet your requirements? If they don't what happens if there is not a third date?
You have a moderate number of other boundary values to consider as well I suspect.
Since ORDER is a big portion of this request it may well be that SQL is not the best solution.
I might start with something like this to get a set of dates 30 or more days apart for each ID, but reducing to your actual dates would require some answers to the above questions.
proc sql; create table days30plus as select distinct a.identification, a.date as date1, b.date as date2 from have as a left join have as b on a.identification=b.identification where b.date - a.date ge 30 order by a.identification, a.date ; quit;
I included the Distinct because it seems that healthcare visit data often has multiple visits for single dates.
04-24-2018 06:09 PM
I agree with @ballardw, but also question why you have multiple procs when it could all be done in one proc sql call.
I, too, would prefer knowing your rules, but here is a simple example:
%let defnumb=3; libname temp '/folders/myfolders/temp'; data temp.SCA_ICD10_def&defnumb._line_rec; input encrypted_id Med_def1516171920_count CLAIM_SERVICE_FROM_DATE $; cards; 1 1 20180101 1 2 20180120 1 3 20180128 1 4 20180228 1 5 20180315 1 6 20180330 ; proc sql; create table days30plus as select distinct a.encrypted_id, a.Med_def1516171920_count as count1, b.Med_def1516171920_count as count2, c.Med_def1516171920_count as count3, input(put(a.CLAIM_SERVICE_FROM_DATE,$8.),yymmdd8.) as date1, input(put(b.CLAIM_SERVICE_FROM_DATE,$8.),yymmdd8.) as date2, input(put(c.CLAIM_SERVICE_FROM_DATE,$8.),yymmdd8.) as date3 from temp.SCA_ICD10_def&defnumb._line_rec a left join temp.SCA_ICD10_def&defnumb._line_rec b on a.encrypted_id=b.encrypted_id left join temp.SCA_ICD10_def&defnumb._line_rec c on b.encrypted_id=c.encrypted_id where calculated date2 - calculated date1 ge 30 and calculated date3 - calculated date2 ge 30 order by a.encrypted_id, date1 ; quit;
Art, CEO, AnalystFinder.com
04-25-2018 08:59 AM - edited 04-25-2018 09:27 AM
I spoke to the researcher, and the current thinking is that it doesn't matter if there are intermediate visits or not. As long as there are visits that are more than 30 days apart, it is a qualifying case. The explanation is that we just want to make sure that the patient is getting care at multiple times throughout the year, rather just having a few visits in a short period of time.
Unfortunately, the output file is still much too large with either of these approaches. It uses all 2.5TB of empty disc space and then SAS gives me an "out of resources" error. As I mentioned before, a WHERE clause that discards records isn't advisable because I need to retain those cases to classify them as true positive, false positive, etc later on.
What if we flattened the file? I've never done that in SAS, but it seems like it might work. The problem is that the output file could potentially have 365 date columns, if there was a person in the data who had a visit every day. In reality, I think the max is more like 100 visits in a year. I don't know how long it would take to produce that file, but at least it should be much smaller than the current approach.
Of course, then the problem becomes calculating all the potential differences between all those date variables. Would that be done with some sort of do-loop or macro?
04-25-2018 09:45 AM
How big is the file you are trying analyze? Also, is it already in ID order and, if not, do you have the disk space needed to sort it?
It would definitely make it easier for people to suggest a solution if you provide an example dataset.
Art, CEO, AnalystFinder.com
04-25-2018 10:49 AM
04-25-2018 02:52 PM
Since you're running out of space using the sql approach, if your data are already grouped by encrypted_id then you might be able to use something like the following code. It only outputs one record per encrypted_id, with each record only containing the encrypted_id and the flag variable. I set the temporary array to maintain up to 360 dates as one of your encrypted_ids had 357 records:
data want (keep=encrypted_id flag); set have; by encrypted_id notsorted; array x _temporary_; retain x; if first.encrypted_id then counter=1; else counter+1; x(counter)=CLAIM_DATE_A_n2; if last.encrypted_id then do; flag=0; if counter gt 2 then do; do i = 1 to counter-2; do j = (i + 1) to counter-1 ; do k = (j + 1) to counter ; if x(j)-x(i) ge 30 and x(k)-x(j) ge 30 then do; flag=1; leave; end; end; if flag then leave; end; if flag then leave; end; end; output; end; run;
Art, CEO, AnalystFinder.com
04-30-2018 09:07 AM
I was able to use the Proc SQL approach by using where clauses that reduce the number of redundant date matches while still retaining and flagging all cases. And it still runs reasonably fast. Here is the final code:
proc sql; create table temp.SCA_ICD10_def&defnumb._mm as select distinct a.encrypted_id, a.Med_defOPQRS_count, a.DxSCAOS_max_A, b.DxSCAOS_max_B, c.DxSCAOS_max_C, a.CLAIM_DATE_A_n2, b.CLAIM_DATE_B_n2, c.CLAIM_DATE_C_n2 from temp.SCA_ICD10_def&defnumb._line_rec a left join temp.SCA_ICD10_def&defnumb._line_rec b on a.encrypted_id=b.encrypted_id left join temp.SCA_ICD10_def&defnumb._line_rec c on b.encrypted_id=c.encrypted_id where c.CLAIM_DATE_C_n2 >= b.CLAIM_DATE_B_n2 and b.CLAIM_DATE_B_n2 >= a.CLAIM_DATE_A_n2 order by a.encrypted_id, CLAIM_DATE_A_n2; quit;