- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a medical procedure codes with ranges overlapped at some occasions. I'd like to flag when two ranges overlap and create variable flag_overlap=0; else 1. Below approach doesn't seem to do the job. Any help is appreciate. Thanks.
data have;
input type code1 $ code2 $;
cards;
1 00100 00222
2 00300 00352
3 00400 00474
4 00500 00580
5 00600 00670
6 00700 00797
7 00800 00882
8 00902 00952
9 01112 01173
10 01200 01274
11 01320 01444
12 01462 01522
13 01610 01680
14 01710 01782
15 01810 01860
16 01916 01936
17 01951 01953
18 01958 01969
19 01990 01999
20 10021 10022
21 10030 19499
22 20005 29999
23 30000 32999
24 33010 37799
25 38100 38999
26 39000 39599
27 40490 49999
28 50010 53899
29 54000 55899
30 55920 55920
31 55970 55980
32 56405 58999
33 59000 59899
34 60000 60699
35 61000 64999
36 65091 68899
37 69000 69979
38 69990 69990
39 70010 76499
40 76506 76999
41 77001 77022
42 77053 77067
43 77071 77086
44 77261 77799
45 78012 79999
46 0001M 0013M
47 81490 81599
48 0001U 0034U
49 80047 80076
50 80150 80299
51 80305 80377
52 80400 80439
53 80500 80502
54 81000 81099
55 81105 81479
56 81410 81471
57 82009 84999
58 85002 85999
59 86000 86849
60 86850 86999
61 87003 87999
62 88000 88099
63 88104 88199
64 88230 88299
65 88300 88399
66 88720 88749
67 89049 89240
68 89250 89398
69 90281 90399
70 90460 90474
71 90476 90749
72 90785 90899
73 90901 90911
74 90935 90999
75 91010 91299
76 92002 92499
77 92502 92700
78 92950 93799
79 93880 93998
80 94002 94799
81 95004 95199
82 95250 95251
83 95803 96020
84 96040 96040
85 96101 96127
86 96150 96161
87 96360 96549
88 96567 96574
89 96900 96999
90 97010 97799
91 97802 97804
92 97810 97814
93 98925 98929
94 98940 98943
95 98960 98962
96 98966 98969
97 99000 99091
98 99100 99140
99 99151 99157
100 99170 99199
101 99500 99602
102 99605 99607
103 99201 99215
104 99217 99226
105 99221 99239
106 99241 99255
107 99281 99288
108 99291 99292
109 99304 99318
110 99324 99337
111 99339 99340
112 99341 99350
113 99354 99416
114 99366 99368
115 99374 99380
116 99381 99429
117 99441 99449
118 99450 99456
119 99460 99463
120 99464 99465
121 99466 99480
122 99483 99483
123 99484 99484
124 99487 99490
125 99492 99494
126 99495 99496
127 99497 99498
128 99499 99499
129 0001F 0015F
130 0500F 0584F
131 1000F 1505F
132 2000F 2060F
133 3006F 3776F
134 4000F 4563F
135 5005F 5250F
136 6005F 6150F
137 7010F 7025F
138 9001F 9007F
139 0042T 0042T
140 0054T 0054T
141 0055T 0055T
142 0058T 0058T
;data want; set have(rename=(code1=code11 code2=code22));retain code1 code2; code1=code11; code2=code22;if code11 <= lag(code22) then flag_overlap=1; else flag_overlap=0; run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can we assume that you meant to input CODE2 as character:
input type code1 $ code2 $ ;
It has to be character, since some values contain letters.
Possibly, the first problem is conceptual, understanding how SAS compares character values. For example:
"3006F" falls within the range "30000" to "32999"
Did you know that? Did you want to flag it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You have a number of problems in your data and in your code.
You might want to read the data in as character instead of numeric, using
input type code1 $ code 2 $;
Furthermore, you might want to sort these somehow. Lastly, you can't use the LAG function inside an IF statement, it does not do what you think it should do. You want
prev_code22=lag(code22);
if code11<=prev_code22 then flag_overlap=1;
Lastly, I don't see any purpose in created code11 and code22, when you could use the original variables code1 and code2.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paige, The final code reflecting your suggestions looked like this. Thanks for pointing out to a necessity to sort the data first!!!
proc sort data=have;
by code1 code2;
run;
data want; set have;
prev_code2=lag(code2);
if code1<=prev_code2 then flag_overlap=1; else flag_overlap=0;
run;
data want_nosort; set have_nosort;
prev_code2=lag(code2);
if code1<=prev_code2 then flag_overlap=1; else flag_overlap=0;
run;
proc freq data=want; /*flagged 17*/
tables flag_overlap/nocol norow nocum nopercent list missing;
title "sorted";
run;
proc freq data=want_nosort /*flagged only 8*/;
tables flag_overlap/nocol norow nocum nopercent list missing;
title "not sorted";
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if this is relevant by now. But sorting the data and using the LAG function to find the previous value of CODE2 may not be enough. Even if there is not an overlap with the previous record, one of the earlier records may have had a higher value of CODE2, generating an overlap(?).
Using this code on your data
Proc sort data=have;
by code1 code2;
run;
data want;
set have;
retain last_code2 " ";
flag_overlap=last_code2>code1;
drop last_code2;
if code2>last_code2 then last_code2=code2;
run;
flagged 22 records as having overlaps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can we assume that you meant to input CODE2 as character:
input type code1 $ code2 $ ;
It has to be character, since some values contain letters.
Possibly, the first problem is conceptual, understanding how SAS compares character values. For example:
"3006F" falls within the range "30000" to "32999"
Did you know that? Did you want to flag it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, Big conceptual issue here. Thanks for pointing that out indeed. These are CPT medical billing codes. 5 digit codes belong to CPT category I, 4digitF to CPT category II while 4digitM codes to CPT category III codes. My point is that last letter at the end of the variables make variables mutually exclusive. They are different animals not alphabetical order expected as such 0001M less than 0001U as they're flagged out in the screen shot. I don't know what to do at this point. Gotta grab a coffee or something.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Cruise wrote:
Yes, Big conceptual issue here. Thanks for pointing that out indeed. These are CPT medical billing codes. 5 digit codes belong to CPT category I, 4digitF to CPT category II while 4digitM codes to CPT category III codes. My point is that last letter at the end of the variables make variables mutually exclusive. They are different animals not alphabetical order expected as such 0001M less than 0001U as they're flagged out in the screen shot. I don't know what to do at this point. Gotta grab a coffee or something.
That verbiage is not very clear. Post actual examples of values (note that your original example data include nothing with F or M or U or anything except digits).
Note that FORMATS might help as you can map character values such as 0001F and 0001M to the same 4 digit category for display /count purposes without changing the underlying value. Or to your CPT "category".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Cruise wrote:
Yes there are F, M, U values. I just double checked. proc format doesn't help. I need these codes categorized to adjust in the modeling eventually. there is no way to output from proc format to actual dataset, right?
Depends on what you mean by "output from proc format to actual dataset". Proc format can create a data set of format definitions using the Cntlout option just as it can create a format from a data set using the Cnltin option. Informats or formats can be used to create additional variables that may (or may not) do what is desired or comparison of values with put(var,fmtname.) le put(var2, fmtname.)
However "range" of values with character values can be a problematic concept in general so "overlap" is has to be extremely carefully defined.
If the question is related to grouping things then formats may be one solution option. I haven't actually been able to figure out what this specific data/ code is actually about. So I did not attempt to provide any example of format usage.
I have some experience with multilabel formats that, depending on procedures use, allow treating all of 0001, 0001F, 0001U and 0001M as a single value for counting and/or group under 0001 for a count total.
The structure of ICD codes I am a little familiar with lends itself to multilabel formats as the values go like Vxx, then Vxxy or Vxxyz for subgroups.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here are a couple of ideas about how to approach the problem.
One possibility: create three sets of groupings (one for each "category").
Another possibility: for grouping purposes, move the "M" or the "U" to the beginning of the string. Store that as a separate variable.
Either of these would take care of the overlapping groups. Not knowing the data, I'm not really trying to favor one or the other.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Earlier in my brainstorming processes, I asked the question more conceptually and KurtBremser had just suggested to delineate all the ranges to single lines all the way through. If CPT codes are listed out to one code per line I could simply merge the data to my patients file. I defined ICD9 and ICD10 categories for my patients using their one code per line reference code tables which doesn't exist for CPT codes due to AMA copyrights. However, KurBremser's approach appear to be helpful. I haven't tried his program yet but will do shortly upon some data cleaning.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I had similar thinking. group_ind=Substr(codes, 5,1) and use that as group_ind and use the same code along with first. / last. functions in data step to account for mutually exclusiveness.