Hi everyone,
I am working on a medical database. I will like to keep only selected observations from the dataset. How do i go about it please, expecially when it is a character variable?
For instance, i want to select only patients with lung cancer from a list of all cancer types. The codes for lung cancer is C340 C341 C342 C343 C348 C349.
I tried the following but it did not work:
DATA TCR.LUNG;
SET TCR.tx_ltd_use_9518_feb21;
IF PSITE = 'C340' 'C341' 'C342' 'C343' 'C348' 'C349';
RUN;
my log reads:
DATA TCR.LUNG;
12 SET TCR.tx_ltd_use_9518_feb21;
13 IF PSITE = 'C340' 'C341' 'C342' 'C343' 'C348' 'C349';
------
388
200
ERROR 388-185: Expecting an arithmetic operator.
ERROR 200-322: The symbol is not recognized and will be ignored.
14 RUN;
RUN;
thank you
To compare against a list of values use the IN operator with a comma separated list of values.
IF PSITE in ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
If you will have to select by some groups on a regular basis and the groups as such don't change then it might be worth to create a format and use this format for your selections. Below an example how this could work.
proc format;
value $mygroups
'C340', 'C341', 'C342', 'C343', 'C348', 'C349' = 'Lung Cancer'
other = 'not yet grouped'
;
;
run;
DATA TCR.LUNG;
SET TCR.tx_ltd_use_9518_feb21;
IF put(PSITE,$mygroups.) = 'Lung Cancer';
RUN;
And last but not least: Given these are diagnostical codes there might already be some table with such grouping defined. If so then you could use this table to dynamically create a format or use a lookup into this table to select the diagnostic code you want.
To compare against a list of values use the IN operator with a comma separated list of values.
IF PSITE in ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
If you will have to select by some groups on a regular basis and the groups as such don't change then it might be worth to create a format and use this format for your selections. Below an example how this could work.
proc format;
value $mygroups
'C340', 'C341', 'C342', 'C343', 'C348', 'C349' = 'Lung Cancer'
other = 'not yet grouped'
;
;
run;
DATA TCR.LUNG;
SET TCR.tx_ltd_use_9518_feb21;
IF put(PSITE,$mygroups.) = 'Lung Cancer';
RUN;
And last but not least: Given these are diagnostical codes there might already be some table with such grouping defined. If so then you could use this table to dynamically create a format or use a lookup into this table to select the diagnostic code you want.
Thanks a lot.
when i used this:
IF PSITE in ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
i got lots of error in my log
DATA TCR.LUNG;
20 SET TCR.tx_ltd_use_9518_feb21;
21 IF PSITE = IN ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
22 RUN;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
21:4
NOTE: Invalid numeric data, PSite='C447' , at line 21 column 4.
is it because i did not convert to numeric variable?
for your second suggestion,
i was able to create the format but when i tried the second step, i got this:
DATA TCR.LUNG;
48 SET TCR.tx_ltd_use_9518_feb21;
49 IF put(PSITE,$mygroups.) = 'Lung Cancer';
----------
48
ERROR 48-59: The format $MYGROUPS was not found or could not be loaded.
50 RUN;
i dont know of any tables with such groupings identified or how to get the tables.
It has worked!
i added OPTIONS FMTSEARCH = (TCR.FORMATS); before the IF statement so it could locate the format.
Thanks so much
You need to remove the equal operator and things will work.
21 IF PSITE = IN ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.