I would really appreciate any help with the following issue I'm having:
I am working with administrative health data and am trying to identify all cancers (by their diagnosis codes). I want to use a substring command to identify numbered diagnostic codes within a certain range within a variable in character format. I can't just convert this variable to a numeric variable because there are various other data points (that i'm not interested in) that are nominal.
Specifically, I want to identify all diagnosis codes ranging from 140 to 239, and ignore all other data. When I run the data step, I get an error message that reads:
NOTE: Invalid numeric data, '01L' , at line 1276 column 36. (for several data points)
Here is the relevant code:
data psneo_m9192; set ps_m9192; if substr(icd9,1,3) ge 140 and substr(icd9,1,3) le 239; run;
Despite this, SAS still outputs a dataset with what appears to be the correct range of data. However, I don't know if it is complete/accurate or what the error message means.
Questions:
1) Is the problem that I am trying to identify a numeric range of data in a character variable?
2) How can I know if my output dataset is complete?
Thanks very much!
try:
data psneo_m9192;
set ps_m9192;
if 140<=input(substr(icd9,1,3),4.)le 239;
run;
Linlin
Hi Linlin,
Thanks for the feedback. I tried running this code and got a number of different error messages like this one:
NOTE: Invalid argument to function INPUT at line 1279 column 41.
I think this issue is related to the mix of data points under this icd9 variable, some of which are non-numeric.
The good news is that when I ran it with your code, my output dataset had the same number of observations as when I ran it using the code above.
Do you think it's likely that I'm missing some relevant data because of these errors or is this something I should just ignore?
Thanks again!
try the code below to avoid error message:
data have;
input icd9 $;
cards;
123df
321gd
230bg
345kg
ddddd
150gh
;
data want;
set have;
if anydigit(icd9);/*exclude the observations without digits */
if 140<=input(substr(icd9,1,3),4.)le 239;
proc print;run;
Linlin
Frankly, it is just a note, not an error (although _error_ will be flagged), so data step will move on, and it will not affect your results. Unless you want to include (or investigate) those obs, you can take it easy.
Haikuo
Edit: if you are really annoyed by the note, following code may help:
data psneo_m9192; set ps_m9192;
if not ANYALPHA(substr(icd9,1,3));
if 140 <= substr(icd9,1,3) <= 239; run;
hi ... if the diagnosis code is a character variable, why not treat it like a character variable ...
data want;
set have;
where icd9 ge : '140' and icd9 le : '239';
run;
Hi everyone,
Although I was not able to get rid of the error notes by trying the suggestions above, I got the same number of observations however I ran it so I am just going to ignore the notes and proceed.
Thanks so much for your help!
Linds
If you want suppress the information. using a little modifier ?? .
data psneo_m9192;
set ps_m9192;
if 140<=input(substr(icd9,1,3),?? 4.)le 239;
run;
Linds,
You got the same number of observations, FOR THIS BATCH OF DATA. There is no guarantee that you can ignore these messages for the next batch of data.
One of the key issues to decide is whether you would like to select an ICD9 code that begins with 15L. Does it fall into the range you specified? Some methods select it, some methods don't. It's your call if you want to investigate further.
Good luck.
EpiLinds,
Astounding has good advice. ICD data can bite you... "01L" is a code for a laboratory only visit; see
http://www.health.gov.bc.ca/msp/infoprac/diagcodes/index.pdf
You should use the INPUT function to explicitly convert your text to numbers (saves CPU cycles and is well defined). I would modify Haikou's code to include it
data psneo_m9192; set ps_m9192;
if not ANYALPHA(substr(icd9,1,3));
if 140 <= INPUT(substr(icd9,1,3)) <= 239; run;
Doc Muhlbaier
Duke
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.