Hi everyone,
I am working on a medical database. I will like to keep only selected observations from the dataset. How do i go about it please, expecially when it is a character variable?
For instance, i want to select only patients with lung cancer from a list of all cancer types. The codes for lung cancer is C340 C341 C342 C343 C348 C349.
I tried the following but it did not work:
DATA TCR.LUNG;
SET TCR.tx_ltd_use_9518_feb21;
IF PSITE = 'C340' 'C341' 'C342' 'C343' 'C348' 'C349';
RUN;
my log reads:
DATA TCR.LUNG;
12 SET TCR.tx_ltd_use_9518_feb21;
13 IF PSITE = 'C340' 'C341' 'C342' 'C343' 'C348' 'C349';
------
388
200
ERROR 388-185: Expecting an arithmetic operator.
ERROR 200-322: The symbol is not recognized and will be ignored.
14 RUN;
RUN;
thank you
To compare against a list of values use the IN operator with a comma separated list of values.
IF PSITE in ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
If you will have to select by some groups on a regular basis and the groups as such don't change then it might be worth to create a format and use this format for your selections. Below an example how this could work.
proc format;
value $mygroups
'C340', 'C341', 'C342', 'C343', 'C348', 'C349' = 'Lung Cancer'
other = 'not yet grouped'
;
;
run;
DATA TCR.LUNG;
SET TCR.tx_ltd_use_9518_feb21;
IF put(PSITE,$mygroups.) = 'Lung Cancer';
RUN;
And last but not least: Given these are diagnostical codes there might already be some table with such grouping defined. If so then you could use this table to dynamically create a format or use a lookup into this table to select the diagnostic code you want.
To compare against a list of values use the IN operator with a comma separated list of values.
IF PSITE in ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
If you will have to select by some groups on a regular basis and the groups as such don't change then it might be worth to create a format and use this format for your selections. Below an example how this could work.
proc format;
value $mygroups
'C340', 'C341', 'C342', 'C343', 'C348', 'C349' = 'Lung Cancer'
other = 'not yet grouped'
;
;
run;
DATA TCR.LUNG;
SET TCR.tx_ltd_use_9518_feb21;
IF put(PSITE,$mygroups.) = 'Lung Cancer';
RUN;
And last but not least: Given these are diagnostical codes there might already be some table with such grouping defined. If so then you could use this table to dynamically create a format or use a lookup into this table to select the diagnostic code you want.
Thanks a lot.
when i used this:
IF PSITE in ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
i got lots of error in my log
DATA TCR.LUNG;
20 SET TCR.tx_ltd_use_9518_feb21;
21 IF PSITE = IN ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
22 RUN;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
21:4
NOTE: Invalid numeric data, PSite='C447' , at line 21 column 4.
is it because i did not convert to numeric variable?
for your second suggestion,
i was able to create the format but when i tried the second step, i got this:
DATA TCR.LUNG;
48 SET TCR.tx_ltd_use_9518_feb21;
49 IF put(PSITE,$mygroups.) = 'Lung Cancer';
----------
48
ERROR 48-59: The format $MYGROUPS was not found or could not be loaded.
50 RUN;
i dont know of any tables with such groupings identified or how to get the tables.
It has worked!
i added OPTIONS FMTSEARCH = (TCR.FORMATS); before the IF statement so it could locate the format.
Thanks so much
You need to remove the equal operator and things will work.
21 IF PSITE = IN ('C340', 'C341', 'C342', 'C343', 'C348', 'C349');
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.