BookmarkSubscribeRSS Feed
EpiLinds
Calcite | Level 5

I would really appreciate any help with the following issue I'm having:

I am working with administrative health data and am trying to identify all cancers (by their diagnosis codes).  I want to use a substring command to identify numbered diagnostic codes within a certain range within a variable in character format.  I can't just convert this variable to a numeric variable because there are various other data points (that i'm not interested in) that are nominal.

Specifically, I want to identify all diagnosis codes ranging from 140 to 239, and ignore all other data.  When I run the data step, I get an error message that reads:

          NOTE: Invalid numeric data, '01L' , at line 1276 column 36. (for several data points)

Here is the relevant code:

          data psneo_m9192; set ps_m9192; if substr(icd9,1,3) ge 140 and substr(icd9,1,3) le 239; run;

Despite this, SAS still outputs a dataset with what appears to be the correct range of data.  However, I don't know if it is complete/accurate or what the error message means.

Questions:

1) Is the problem that I am trying to identify a numeric range of data in a character variable?

2) How can I know if my output dataset is complete?

Thanks very much!

9 REPLIES 9
Linlin
Lapis Lazuli | Level 10

try:

data psneo_m9192;

set ps_m9192;

if 140<=input(substr(icd9,1,3),4.)le 239;

run;

Linlin

EpiLinds
Calcite | Level 5

Hi Linlin,

Thanks for the feedback.  I tried running this code and got a number of different error messages like this one:

     NOTE: Invalid argument to function INPUT at line 1279 column 41.

I think this issue is related to the mix of data points under this icd9 variable, some of which are non-numeric. 

The good news is that when I ran it with your code, my output dataset had the same number of observations as when I ran it using the code above.

Do you think it's likely that I'm missing some relevant data because of these errors or is this something I should just ignore?

Thanks again!

Linlin
Lapis Lazuli | Level 10

try the code below to avoid error message:

data have;

input icd9 $;

cards;

123df

321gd

230bg

345kg

ddddd

150gh

;

data want;

set have;

if anydigit(icd9);/*exclude the observations without digits */

if 140<=input(substr(icd9,1,3),4.)le 239;

proc print;run;

Linlin

Haikuo
Onyx | Level 15

Frankly, it is just a note, not an error (although _error_ will be flagged), so data step will move on, and it will not affect your results. Unless you want to include (or investigate) those obs, you can take it easy.

Haikuo

Edit: if you are really annoyed by the note, following code may help:

data psneo_m9192; set ps_m9192;

if     not ANYALPHA(substr(icd9,1,3));

if  140 <= substr(icd9,1,3) <= 239; run;

MikeZdeb
Rhodochrosite | Level 12

hi ... if the diagnosis code is a character variable, why not treat it like a character variable ...

data want;

set have;

where icd9 ge : '140' and icd9 le : '239';

run;

EpiLinds
Calcite | Level 5

Hi everyone,

Although I was not able to get rid of the error notes by trying the suggestions above, I got the same number of observations however I ran it so I am just going to ignore the notes and proceed.

Thanks so much for your help!

Linds

Ksharp
Super User

If you want suppress the information. using a little modifier ?? .

data psneo_m9192;

set ps_m9192;

if 140<=input(substr(icd9,1,3),?? 4.)le 239;

run;

Astounding
PROC Star

Linds,

You got the same number of observations, FOR THIS BATCH OF DATA.  There is no guarantee that you can ignore these messages for the next batch of data.

One of the key issues to decide is whether you would like to select an ICD9 code that begins with 15L.  Does it fall into the range you specified?  Some methods select it, some methods don't.  It's your call if you want to investigate further.

Good luck.

Doc_Duke
Rhodochrosite | Level 12

EpiLinds,

Astounding has good advice.  ICD data can bite you...  "01L" is a code for a laboratory only visit; see

http://www.health.gov.bc.ca/msp/infoprac/diagcodes/index.pdf

You should use the INPUT function to explicitly convert your text to numbers (saves CPU cycles and is well defined).    I would modify Haikou's code to include it

data psneo_m9192; set ps_m9192;

if     not ANYALPHA(substr(icd9,1,3));

if  140 <= INPUT(substr(icd9,1,3)) <= 239; run;

Doc Muhlbaier

Duke

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2229 views
  • 3 likes
  • 7 in conversation