10-11-2017 11:41 AM
I am having to pull any cannabis-related event based on ICD code from several datasets. The datasets have anywhere from 10-25 diagnosis variables where the ICD code could be found. I can code in every variable and ICD code manually, but would prefer to use an array if possible.
I have tried to code below, but it just ends up outputting entries that match the date range from the where clause and does not output based on the array section:
DATA CANNABIS_DATA; SET DATA.HAVE
where ((TRIAGE_DTTM >= '01JAN2017:00:00:00'd and TRIAGE_DTTM < '01FEB2017:00:00:00'd) or (VISIT_DTTM >= '01JAN2017:00:00:00'd and VISIT_DTTM < '01FEB2017:00:00:00'd)); ARRAY DXARRAY DXCODE1 - DXCODE10; DO I = 1 TO 10; IF substr(DXARRAY[I],1,4) = 'T407' or substr(DXARRAY[I],1,3) = 'F12' THEN OUTPUT; END; RUn;
10-11-2017 12:13 PM
A few ideas to consider ...
First, your variables appear to be DATETIME values. But the literal values in quotes are being treated as dates, not datetimes. You have to end with 'value'dt instead of 'value'd to indicate a datetime.
Second, do you want to output the same observation multiple times if there are multiple diagnoses found?
Third, you can abbreviate the checking in this way:
if dxarray[I] in : ('F12', 'T407') then output;
How do you know that your WHERE statement actually retrieves any observations?
10-11-2017 02:25 PM - edited 10-11-2017 02:28 PM
Thanks for the feedback.
1) Regarding dates, using 'd instead of 'dt still works, it will just ignore the time and look at the date portion.
2) No, output a single observation if multiple diagnoses found. That is a good point, a person could show up with both cannabis-related codes. However, the resulting datasets well be relatively small and I can do a proc sort with a nodupkey option afterwards.
3) Could point with the abbreviating that, I can adjust that.
4) I ran the code and it came up with many entries, all within the date range provided in the where clause, but they included many (most) entries without one of the cannabis-related ICD codes. My hope is that the code would identify entries that are within the date range, and then check them and output them if they match the array portion but it appears to ignore the array entirely.
10-11-2017 02:44 PM
The array logic looks fine (OK, so it might output the same observation twice).
There are obscure cases where it might output too few (not too many) observations ... if the diagnosis codes are right-hand justified instead of left-hand justified, or if they contain lowercase letters instead of uppercase letters. But the logic is sound and should output the correct observations. I see no way your logic would output an observation, unless that observation contains one of the codes on your list.
10-11-2017 04:34 PM
Alright, I got it to work. I'm not quite sure why it was ignoring the array that one time.
You raised a good point about capitalization, justification, leading/trailing blanks, etc. so I ended up not using the in() function and instead used a FIND statement for each ICD code with the "i" option to ignore case.
Here is the final code:
DATA WANT; SET DATA.HAVE; where ((TRIAGE_DTTM >= '01JAN2017:00:00:00'd and TRIAGE_DTTM < '01FEB2017:00:00:00'd) or (VISIT_DTTM >= '01JAN2017:00:00:00'd and VISIT_DTTM < '01FEB2017:00:00:00'd)); ARRAY DXARRAY DXCODE1 - DXCODE10; DO I = 1 TO 10; IF find(DXARRAY[I],'T407',i) or find(DXARRAY[I],'F12',i) THEN OUTPUT; END; RUN;