Help using Base SAS procedures

Invalid numeric data

Reply
Occasional Contributor
Posts: 12

Invalid numeric data

I would really appreciate any help with the following issue I'm having:

I am working with administrative health data and am trying to identify all cancers (by their diagnosis codes).  I want to use a substring command to identify numbered diagnostic codes within a certain range within a variable in character format.  I can't just convert this variable to a numeric variable because there are various other data points (that i'm not interested in) that are nominal.

Specifically, I want to identify all diagnosis codes ranging from 140 to 239, and ignore all other data.  When I run the data step, I get an error message that reads:

          NOTE: Invalid numeric data, '01L' , at line 1276 column 36. (for several data points)

Here is the relevant code:

          data psneo_m9192; set ps_m9192; if substr(icd9,1,3) ge 140 and substr(icd9,1,3) le 239; run;

Despite this, SAS still outputs a dataset with what appears to be the correct range of data.  However, I don't know if it is complete/accurate or what the error message means.

Questions:

1) Is the problem that I am trying to identify a numeric range of data in a character variable?

2) How can I know if my output dataset is complete?

Thanks very much!

Super Contributor
Posts: 1,636

Re: Invalid numeric data

try:

data psneo_m9192;

set ps_m9192;

if 140<=input(substr(icd9,1,3),4.)le 239;

run;

Linlin

Occasional Contributor
Posts: 12

Re: Invalid numeric data

Hi Linlin,

Thanks for the feedback.  I tried running this code and got a number of different error messages like this one:

     NOTE: Invalid argument to function INPUT at line 1279 column 41.

I think this issue is related to the mix of data points under this icd9 variable, some of which are non-numeric. 

The good news is that when I ran it with your code, my output dataset had the same number of observations as when I ran it using the code above.

Do you think it's likely that I'm missing some relevant data because of these errors or is this something I should just ignore?

Thanks again!

Super Contributor
Posts: 1,636

Re: Invalid numeric data

try the code below to avoid error message:

data have;

input icd9 $;

cards;

123df

321gd

230bg

345kg

ddddd

150gh

;

data want;

set have;

if anydigit(icd9);/*exclude the observations without digits */

if 140<=input(substr(icd9,1,3),4.)le 239;

proc print;run;

Linlin

Respected Advisor
Posts: 3,156

Re: Invalid numeric data

Frankly, it is just a note, not an error (although _error_ will be flagged), so data step will move on, and it will not affect your results. Unless you want to include (or investigate) those obs, you can take it easy.

Haikuo

Edit: if you are really annoyed by the note, following code may help:

data psneo_m9192; set ps_m9192;

if     not ANYALPHA(substr(icd9,1,3));

if  140 <= substr(icd9,1,3) <= 239; run;

Valued Guide
Posts: 765

Re: Invalid numeric data

hi ... if the diagnosis code is a character variable, why not treat it like a character variable ...

data want;

set have;

where icd9 ge : '140' and icd9 le : '239';

run;

Occasional Contributor
Posts: 12

Re: Invalid numeric data

Hi everyone,

Although I was not able to get rid of the error notes by trying the suggestions above, I got the same number of observations however I ran it so I am just going to ignore the notes and proceed.

Thanks so much for your help!

Linds

Super User
Posts: 10,044

Re: Invalid numeric data

If you want suppress the information. using a little modifier ?? .

data psneo_m9192;

set ps_m9192;

if 140<=input(substr(icd9,1,3),?? 4.)le 239;

run;

Super User
Posts: 5,516

Re: Invalid numeric data

Linds,

You got the same number of observations, FOR THIS BATCH OF DATA.  There is no guarantee that you can ignore these messages for the next batch of data.

One of the key issues to decide is whether you would like to select an ICD9 code that begins with 15L.  Does it fall into the range you specified?  Some methods select it, some methods don't.  It's your call if you want to investigate further.

Good luck.

Trusted Advisor
Posts: 2,116

Re: Invalid numeric data

Posted in reply to Astounding

EpiLinds,

Astounding has good advice.  ICD data can bite you...  "01L" is a code for a laboratory only visit; see

http://www.health.gov.bc.ca/msp/infoprac/diagcodes/index.pdf

You should use the INPUT function to explicitly convert your text to numbers (saves CPU cycles and is well defined).    I would modify Haikou's code to include it

data psneo_m9192; set ps_m9192;

if     not ANYALPHA(substr(icd9,1,3));

if  140 <= INPUT(substr(icd9,1,3)) <= 239; run;

Doc Muhlbaier

Duke

Ask a Question
Discussion stats
  • 9 replies
  • 1248 views
  • 3 likes
  • 7 in conversation