BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Banke
Pyrite | Level 9

Hi everyone,

 

I am working on a medical database. I will like to keep only selected observations from the dataset. How do i go about it please, expecially when it is a character variable?

For instance, i want to select only patients with lung cancer from a list of all cancer types. The codes for lung cancer is C340 C341 C342 C343 C348 C349.

I tried the following but it did not work:

 

DATA TCR.LUNG;
SET TCR.tx_ltd_use_9518_feb21;
IF PSITE = 'C340' 'C341' 'C342' 'C343' 'C348' 'C349';

RUN;

 

my log reads:

 

DATA TCR.LUNG;
12 SET TCR.tx_ltd_use_9518_feb21;
13 IF PSITE = 'C340' 'C341' 'C342' 'C343' 'C348' 'C349';
------
388
200
ERROR 388-185: Expecting an arithmetic operator.

ERROR 200-322: The symbol is not recognized and will be ignored.

14 RUN;

 


RUN;

 

thank you

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

To compare against a list of values use the IN operator with a comma separated list of values.

IF PSITE in ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');

 

If you will have to select by some groups on a regular basis and the groups as such don't change then it might be worth to create a format and use this format for your selections. Below an example how this could work.

proc format;
  value $mygroups
    'C340', 'C341', 'C342', 'C343', 'C348', 'C349' = 'Lung Cancer'
    other = 'not yet grouped'
  ;
  ;
run;

DATA TCR.LUNG;
  SET TCR.tx_ltd_use_9518_feb21;
  IF put(PSITE,$mygroups.) = 'Lung Cancer';
RUN;

And last but not least: Given these are diagnostical codes there might already be some table with such grouping defined. If so then you could use this table to dynamically create a format or use a lookup into this table to select the diagnostic code you want.

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

To compare against a list of values use the IN operator with a comma separated list of values.

IF PSITE in ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');

 

If you will have to select by some groups on a regular basis and the groups as such don't change then it might be worth to create a format and use this format for your selections. Below an example how this could work.

proc format;
  value $mygroups
    'C340', 'C341', 'C342', 'C343', 'C348', 'C349' = 'Lung Cancer'
    other = 'not yet grouped'
  ;
  ;
run;

DATA TCR.LUNG;
  SET TCR.tx_ltd_use_9518_feb21;
  IF put(PSITE,$mygroups.) = 'Lung Cancer';
RUN;

And last but not least: Given these are diagnostical codes there might already be some table with such grouping defined. If so then you could use this table to dynamically create a format or use a lookup into this table to select the diagnostic code you want.

Banke
Pyrite | Level 9

Thanks a lot.

when i used this: 

IF PSITE in ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');

 i got lots of error in my log

DATA TCR.LUNG;
20 SET TCR.tx_ltd_use_9518_feb21;
21 IF PSITE = IN ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
22 RUN;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
21:4
NOTE: Invalid numeric data, PSite='C447' , at line 21 column 4.

 

is it because i did not convert to numeric variable?

 

for your second suggestion, 

i was able to create the format but when i tried the second step, i got this:

DATA TCR.LUNG;
48 SET TCR.tx_ltd_use_9518_feb21;
49 IF put(PSITE,$mygroups.) = 'Lung Cancer';
----------
48
ERROR 48-59: The format $MYGROUPS was not found or could not be loaded.

50 RUN;

i dont know of any tables with such groupings identified or how to get the tables.

Banke
Pyrite | Level 9

It has worked!

i added  OPTIONS FMTSEARCH = (TCR.FORMATS); before the IF statement so it could locate the format.

 

Thanks so much

Patrick
Opal | Level 21

@Banke 

You need to remove the equal operator and things will work.
21 IF PSITE = IN ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');