I would really appreciate any help with the following issue I'm having: I am working with administrative health data and am trying to identify all cancers (by their diagnosis codes). I want to use a substring command to identify numbered diagnostic codes within a certain range within a variable in character format. I can't just convert this variable to a numeric variable because there are various other data points (that i'm not interested in) that are nominal. Specifically, I want to identify all diagnosis codes ranging from 140 to 239, and ignore all other data. When I run the data step, I get an error message that reads: NOTE: Invalid numeric data, '01L' , at line 1276 column 36. (for several data points) Here is the relevant code: data psneo_m9192; set ps_m9192; if substr(icd9,1,3) ge 140 and substr(icd9,1,3) le 239; run; Despite this, SAS still outputs a dataset with what appears to be the correct range of data. However, I don't know if it is complete/accurate or what the error message means. Questions: 1) Is the problem that I am trying to identify a numeric range of data in a character variable? 2) How can I know if my output dataset is complete? Thanks very much!
... View more