Hi,
I have a dataset (red.disease)with variable named LEUKO_ (leucocyte count). This is continuous.
for example observations are 6.4, 13.6, 10.4 etc There are 67 observations and one missing.
I tried to categorize it using the following code.
data red.disease1;
set red.disease;
if LEUKO_<4.0 then LEUKO_1='Low';
else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal';
else if 11.0<LEUKO_ then LEUKO_='High';
if LEUKO_=. then LEUKO_1=' ';
run;
Now the dataset red.disease1 has 9 missing observations for LEUKO_ and LEUKO_1 both.
I noticed all the observations with value greater than 10.5 are missing in the new dataset.
I then tried to change the code to play around into following
data red.diseasezz;
set red.disease7;
if LEUKO_<4.0 then LEUKO_1='Low';
else if 4.0<=LEUKO_<=20.0 then LEUKO_1='Normal';
else if 20.0<LEUKO_ then LEUKO_='High';
if LEUKO_=. then LEUKO_1=' ';
run;
Now red.diseasezz has all the observations.
20.0 (number is used to play around) is greater than maximum value for the observation.
Please let me know where is the error in my work.
Thanks
@Kyra wrote:
This code gives me below error:
if LEUKO_<4.0 then LEUKO_1='Low';
13 else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal';
14 else if 11.015 Else LEUKO_=. then LEUKO_1=' ';
----
388
202
ERROR 388-185: Expecting an arithmetic operator.
ERROR 202-322: The option or parameter is not recognized and will be
ignored.
1) Post code and log entries in a code box opened using the </> icon that appears above the message box on the forum to maintain the appearance of the code. The underscore characters appear under the place SAS determined the error exists.
Copy the entire data step or proc along with the errors. Sometimes your error will occur because of a missing or extra quote or parentheses on a previous line, or a missing semicolon.
12 if LEUKO_<4.0 then LEUKO_1='Low'; 13 else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal'; 14 else if 11.015 Else LEUKO_=. then LEUKO_1=' '; ---- 388 202 ERROR 388-185: Expecting an arithmetic operator.
I suspect if you look at your log it looked more like this. The second ELSE is incorrect. " If 11.015 what?". No comparison so the value of 11.015 would be treated as true (any non-zero or non-missing value) if (something true) expects a "then" or similar.
Instead of creating additional variables you can use a custom format to display a value that is based on a single variable. Example of creating a format and using it with some dummy data to print or summarize:
proc format; value leuko . = ' ' low -<4 = 'Low' 4 - 11 = 'Normal' 11 <- high = 'High' ; data example; input x; datalines; . 0 1 3.999 4.0 4.3 11 11.1 ; proc print data=example; format x leuko.; run; proc freq data=example; tables x; format x leuko.; run;
The key words on the left of the equal in the format of Low and High represent the smallest and largest numbers that SAS will use so you don't have to specify a specific limit. I suspect that instead of LOW you may want 0 unless it is possible to have a measurement less than 0. The value < is the equivalent of less than and a dash without < on either end has equals on both ends. Also if there is a theoretical maximum value you could use that instead of high and then add a category like 100<-high = 'Out of range high'
Formats in SAS have some very nice properties. One is that the code for doing multiple values is much simpler than many If/then/else statements. Second, once the format is available in a session it can be used with any appropriate value. Imagine that you have questionnaire with 25 questions that use a 1 to 10 response scale that you determine you may need to group responses 1 -3 as Low, 4 -7 as middle and 8-10 as high. You would need to create 25 additional variables (not hard but still lots more variables). However applying the same format to all 25 questions would allow summaries with that rule. Or you could make another format to consider what if you used 1 -4 for low and 5-7 for middle. New format and use with a format statement in a procedure. Otherwise that could be another 25 variables.
The groups created by proc format will be honored by most analysis, reporting or graphing procedures to create groups of similar records.
And if you have the information in a nice data set you can create a format from data. I have formats that turn Zipcode in affiliations with service regions for example.
Missing values are smaller than any possible actual number. So missing values of LEUKO_ will result in LEUKO_1 being set to 'Low'. Note that .Z is the largest missing value.
if .Z < LEUKO_<4.0 then LEUKO_1='Low';
If you do not defined LEUKO_1 before this code then it will be created as character with a maximum storage length of 3 bytes because the first place you reference it you are setting it to a string constant that has only three characters.
@Kyra wrote:
This code gives me below error:
if LEUKO_<4.0 then LEUKO_1='Low';
13 else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal';
14 else if 11.015 Else LEUKO_=. then LEUKO_1=' ';
----
388
202
ERROR 388-185: Expecting an arithmetic operator.
ERROR 202-322: The option or parameter is not recognized and will be
ignored.
1) Post code and log entries in a code box opened using the </> icon that appears above the message box on the forum to maintain the appearance of the code. The underscore characters appear under the place SAS determined the error exists.
Copy the entire data step or proc along with the errors. Sometimes your error will occur because of a missing or extra quote or parentheses on a previous line, or a missing semicolon.
12 if LEUKO_<4.0 then LEUKO_1='Low'; 13 else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal'; 14 else if 11.015 Else LEUKO_=. then LEUKO_1=' '; ---- 388 202 ERROR 388-185: Expecting an arithmetic operator.
I suspect if you look at your log it looked more like this. The second ELSE is incorrect. " If 11.015 what?". No comparison so the value of 11.015 would be treated as true (any non-zero or non-missing value) if (something true) expects a "then" or similar.
Instead of creating additional variables you can use a custom format to display a value that is based on a single variable. Example of creating a format and using it with some dummy data to print or summarize:
proc format; value leuko . = ' ' low -<4 = 'Low' 4 - 11 = 'Normal' 11 <- high = 'High' ; data example; input x; datalines; . 0 1 3.999 4.0 4.3 11 11.1 ; proc print data=example; format x leuko.; run; proc freq data=example; tables x; format x leuko.; run;
The key words on the left of the equal in the format of Low and High represent the smallest and largest numbers that SAS will use so you don't have to specify a specific limit. I suspect that instead of LOW you may want 0 unless it is possible to have a measurement less than 0. The value < is the equivalent of less than and a dash without < on either end has equals on both ends. Also if there is a theoretical maximum value you could use that instead of high and then add a category like 100<-high = 'Out of range high'
Formats in SAS have some very nice properties. One is that the code for doing multiple values is much simpler than many If/then/else statements. Second, once the format is available in a session it can be used with any appropriate value. Imagine that you have questionnaire with 25 questions that use a 1 to 10 response scale that you determine you may need to group responses 1 -3 as Low, 4 -7 as middle and 8-10 as high. You would need to create 25 additional variables (not hard but still lots more variables). However applying the same format to all 25 questions would allow summaries with that rule. Or you could make another format to consider what if you used 1 -4 for low and 5-7 for middle. New format and use with a format statement in a procedure. Otherwise that could be another 25 variables.
The groups created by proc format will be honored by most analysis, reporting or graphing procedures to create groups of similar records.
And if you have the information in a nice data set you can create a format from data. I have formats that turn Zipcode in affiliations with service regions for example.
if 11.015 Else LEUKO_=. then LEUKO_1=' ';
Before you can use an ELSE after the IF, the THEN branch needs to be complete.
if 11.015
would be a true condition (any numeric value apart from 0 or missing is considered true), if the rest was syntactically correct.
You really do not need to create a new variable. Just use Proc format, the ranges will be displayed in your output as Low, Medium, High
proc format;
value disfmt
Low - < 4 = 'Low'
4-<11 = 'Normal'
11-High='High'
other = ' '
;
run;
Proc print;
var Leuko;
format Leuko disfmt.;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.