BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kyra
Quartz | Level 8

Hi,

I have a dataset (red.disease)with variable named LEUKO_ (leucocyte count). This is continuous.

for example observations are 6.4, 13.6, 10.4 etc There are 67 observations and one missing.

I tried to categorize it using the following code.

 

data red.disease1;
set red.disease;
if LEUKO_<4.0 then LEUKO_1='Low';
else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal';
else if 11.0<LEUKO_ then LEUKO_='High';
if LEUKO_=. then LEUKO_1=' ';
run;

 

Now the dataset red.disease1 has 9 missing observations for LEUKO_ and LEUKO_1 both.

I noticed all the observations with value greater than 10.5 are missing in the new dataset.

I then tried to change the code to play around into following

data red.diseasezz;
set red.disease7;
if LEUKO_<4.0 then LEUKO_1='Low';
else if 4.0<=LEUKO_<=20.0 then LEUKO_1='Normal';
else if 20.0<LEUKO_ then LEUKO_='High';
if LEUKO_=. then LEUKO_1=' ';
run;

Now  red.diseasezz has all the observations.

20.0 (number is used to play around) is greater than maximum value for the observation.

 

Please let me know where is the error in my work.

 

Thanks

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Kyra wrote:
This code gives me below error:
if LEUKO_<4.0 then LEUKO_1='Low';
13 else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal';
14 else if 11.015 Else LEUKO_=. then LEUKO_1=' ';
----
388
202
ERROR 388-185: Expecting an arithmetic operator.

ERROR 202-322: The option or parameter is not recognized and will be
ignored.

1) Post code and log entries in a code box opened using the </> icon that appears above the message box on the forum to maintain the appearance of the code. The underscore characters appear under the place SAS determined the error exists.

Copy the entire data step or proc along with the errors. Sometimes your error will occur because of a missing or extra quote or parentheses on a previous line, or a missing semicolon.

 

12 if LEUKO_<4.0 then LEUKO_1='Low';
13 else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal';
14 else if 11.015 Else LEUKO_=. then LEUKO_1=' ';
                  ----
                  388
                  202
ERROR 388-185: Expecting an arithmetic operator.

I suspect if you look at your log it looked more like this. The second ELSE is incorrect. " If 11.015 what?". No comparison so the value of 11.015 would be treated as true (any non-zero or non-missing value) if (something true) expects a "then" or similar.

 

Instead of creating additional variables you can use a custom format to display a value that is based on a single variable. Example of creating a format and using it with some dummy data to print or summarize:

proc format;
value leuko
. = ' '
low -<4 = 'Low'
4 - 11  = 'Normal'
11 <- high = 'High'
;

data example;
   input x;
datalines;
.
0
1
3.999
4.0
4.3
11
11.1
;

proc print data=example;
   format x leuko.;
run;

proc freq data=example;
   tables x;
   format x leuko.;
run;

The key words on the left of the equal in the format of Low and High represent the smallest and largest numbers that SAS will use so you don't have to specify a specific limit. I suspect that instead of LOW you may want 0 unless it is possible to have a measurement less than 0. The  value < is the equivalent of less than and a dash without < on either end has equals on both ends. Also if there is a theoretical maximum value you could use that instead of high and then add a category like 100<-high = 'Out of range high'

 

Formats in SAS have some very nice properties. One is that the code for doing multiple values is much simpler than many If/then/else statements. Second, once the format is available in a session it can be used with any appropriate value. Imagine that you have questionnaire with 25 questions that use a 1 to 10 response scale that you determine you may need to group responses 1 -3 as Low, 4 -7 as middle and 8-10 as high. You would need to create 25 additional variables (not hard but still lots more variables). However applying the same format to all 25 questions would allow summaries with that rule. Or you could make another format to consider what if you used 1 -4 for low and 5-7 for middle. New format and use with a format statement in a procedure. Otherwise that could be another 25 variables.

The groups created by proc format will be honored by most analysis, reporting or graphing procedures to create groups of similar records.

 

And if you have the information in a nice data set you can create a format from data. I have formats that turn Zipcode in affiliations with service regions for example.

View solution in original post

6 REPLIES 6
Anandkvn
Lapis Lazuli | Level 10
data red.disease1;
set red.disease;
if LEUKO_<4.0 then LEUKO_1='Low';
else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal';
else if 11.0<LEUKO_ then LEUKO_='High';
Else LEUKO_=. then LEUKO_1=' ';
run;
Tom
Super User Tom
Super User

Missing values are smaller than any possible actual number. So missing values of LEUKO_ will result in LEUKO_1 being set to 'Low'.  Note that .Z is the largest missing value.

if .Z < LEUKO_<4.0 then LEUKO_1='Low';

If you do not defined LEUKO_1 before this code then it will be created as character with a maximum storage length of 3 bytes because the first place you reference it you are setting it to a string constant that has only three characters.

Kyra
Quartz | Level 8
This code gives me below error:
if LEUKO_<4.0 then LEUKO_1='Low';
13 else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal';
14 else if 11.015 Else LEUKO_=. then LEUKO_1=' ';
----
388
202
ERROR 388-185: Expecting an arithmetic operator.

ERROR 202-322: The option or parameter is not recognized and will be
ignored.
ballardw
Super User

@Kyra wrote:
This code gives me below error:
if LEUKO_<4.0 then LEUKO_1='Low';
13 else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal';
14 else if 11.015 Else LEUKO_=. then LEUKO_1=' ';
----
388
202
ERROR 388-185: Expecting an arithmetic operator.

ERROR 202-322: The option or parameter is not recognized and will be
ignored.

1) Post code and log entries in a code box opened using the </> icon that appears above the message box on the forum to maintain the appearance of the code. The underscore characters appear under the place SAS determined the error exists.

Copy the entire data step or proc along with the errors. Sometimes your error will occur because of a missing or extra quote or parentheses on a previous line, or a missing semicolon.

 

12 if LEUKO_<4.0 then LEUKO_1='Low';
13 else if 4.0<=LEUKO_<=11.0 then LEUKO_1='Normal';
14 else if 11.015 Else LEUKO_=. then LEUKO_1=' ';
                  ----
                  388
                  202
ERROR 388-185: Expecting an arithmetic operator.

I suspect if you look at your log it looked more like this. The second ELSE is incorrect. " If 11.015 what?". No comparison so the value of 11.015 would be treated as true (any non-zero or non-missing value) if (something true) expects a "then" or similar.

 

Instead of creating additional variables you can use a custom format to display a value that is based on a single variable. Example of creating a format and using it with some dummy data to print or summarize:

proc format;
value leuko
. = ' '
low -<4 = 'Low'
4 - 11  = 'Normal'
11 <- high = 'High'
;

data example;
   input x;
datalines;
.
0
1
3.999
4.0
4.3
11
11.1
;

proc print data=example;
   format x leuko.;
run;

proc freq data=example;
   tables x;
   format x leuko.;
run;

The key words on the left of the equal in the format of Low and High represent the smallest and largest numbers that SAS will use so you don't have to specify a specific limit. I suspect that instead of LOW you may want 0 unless it is possible to have a measurement less than 0. The  value < is the equivalent of less than and a dash without < on either end has equals on both ends. Also if there is a theoretical maximum value you could use that instead of high and then add a category like 100<-high = 'Out of range high'

 

Formats in SAS have some very nice properties. One is that the code for doing multiple values is much simpler than many If/then/else statements. Second, once the format is available in a session it can be used with any appropriate value. Imagine that you have questionnaire with 25 questions that use a 1 to 10 response scale that you determine you may need to group responses 1 -3 as Low, 4 -7 as middle and 8-10 as high. You would need to create 25 additional variables (not hard but still lots more variables). However applying the same format to all 25 questions would allow summaries with that rule. Or you could make another format to consider what if you used 1 -4 for low and 5-7 for middle. New format and use with a format statement in a procedure. Otherwise that could be another 25 variables.

The groups created by proc format will be honored by most analysis, reporting or graphing procedures to create groups of similar records.

 

And if you have the information in a nice data set you can create a format from data. I have formats that turn Zipcode in affiliations with service regions for example.

Kurt_Bremser
Super User
if 11.015 Else LEUKO_=. then LEUKO_1=' ';

Before you can use an ELSE after the IF, the THEN branch needs to be complete.

if 11.015

would be a true condition (any numeric value apart from 0 or missing is considered true), if the rest was syntactically correct.

ghosh
Barite | Level 11

You really do not need to create a new variable.  Just use Proc format, the ranges will be displayed in your output as Low, Medium, High

proc format;
 value disfmt
Low - < 4 = 'Low'
4-<11 = 'Normal'
11-High='High'
 other = ' '
 ; 
run;
Proc print;
var Leuko;
format Leuko disfmt.;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 756 views
  • 2 likes
  • 6 in conversation