BookmarkSubscribeRSS Feed
nmuse
Calcite | Level 5

I am trying to create 3 groups with the code down below but SAS is only including numbers under the first group (<1). Why is this happening?


DATA WORK.IMPORT1;
SET WORK.IMPORT;


/* Calculate the group category based on incidence rate ranges */
IF ARI_Guest_Incidence<1 THEN ARI_Guest_Group_Category = "<1";
IF 1<=ARI_Guest_Incidence<=1.99 THEN ARI_Guest_Group_Category = "1-1.99";
IF ARI_Guest_Incidence>=2 THEN ARI_Guest_Group_Category = "2 and above";


IF ARI_Crew_Incidence<1 THEN ARI_Crew_Group_Category = "<1";
IF 1<=ARI_Crew_Incidence<=1.99 THEN ARI_Crew_Group_Category = "1-1.99";
IF ARI_Crew_Incidence>=2 THEN ARI_Crew_Group_Category = "2 and above";

IF CLI_Guest_Incidence<1 THEN CLI_Guest_Group_Category = "<1";
IF 1<=CLI_Guest_Incidence<=1.99 THEN CLI_Guest_Group_Category = "1-1.99";
IF CLI_Guest_Incidence>=2 THEN CLI_Guest_Group_Category = "2 and above";

IF CLI_Crew_Incidence<1 THEN CLI_Crew_Group_Category = "<1";
IF 1<=CLI_Crew_Incidence<=1.99 THEN CLI_Crew_Group_Category = "1-1.99";
IF CLI_Crew_Incidence>=2 THEN CLI_Crew_Group_Category = "2 and above";

IF ILI_Guest_Incidence<1 THEN ILI_Guest_Group_Category = "<1";
IF 1<=ILI_Guest_Incidence<=1.99 THEN ILI_Guest_Group_Category = "1-1.99";
IF ILI_Guest_Incidence>=2 THEN ILI_Guest_Group_Category = "2 and above";

IF ILI_Crew_Incidence<1 THEN ILI_Crew_Group_Category = "<1";
IF 1<=ILI_Crew_Incidence<=1.99 THEN ILI_Crew_Group_Category = "1-1.99";
IF ILI_Crew_Incidence>=2 THEN ILI_Crew_Group_Category = "2 and above";

RUN;

8 REPLIES 8
john_mccall
SAS Employee

.

 

After the SET statement, define the length of ARI_Guest_Group_Category  to hold the longest value.  For example:

Length ARI_Guest_Group_Category  $  7;

Kurt_Bremser
Super User

See Maxim 47.

 

Create a custom format with PROC FORMAT, then use this in simple assignments in PUT functions.

Depending on what you intend to do with the groups, you won't even need a new variable. SAS statistical procedures (FREQ, MEANS) use formatted values automatically to build groups.

Tom
Super User Tom
Super User

If everything is ending up in the first category then the value of the source variable satisfies the first IF condition.  Do you have missing values?  SAS will treat missing values are less than any actual number.

 

To really see how your re-coding worked use PROC FREQ.

proc freq ;
tables
   ARI_Guest_Incidence*ARI_Guest_Group_Category
   ARI_Crew_Incidence*ARI_Crew_Group_Category 
   / missing list ;
format _all_;
run;
DebG
Obsidian | Level 7

The first value sets the width of the output, so it is good practice to set the output, a format or width statement will do.  A second note, and for improved efficiency, add an else to the subsequent statements so that once the category is found the remainder of the if statements do not need to be completed. 

 

I have added some data to work with.

 

data import;
format ARI_Guest_Incidence 8.3;
input ARI_Guest_Incidence;
cards;
1.2
0.1
3
;
run;


DATA WORK.IMPORT1;
SET WORK.IMPORT;

format ARI_Guest_Group_Category $15.;
/* Calculate the group category based on incidence rate ranges */
IF ARI_Guest_Incidence<1 THEN ARI_Guest_Group_Category = "<1";
else IF 1<=ARI_Guest_Incidence<=1.99 THEN ARI_Guest_Group_Category = "1-1.99";
else IF ARI_Guest_Incidence>=2 THEN ARI_Guest_Group_Category = "2 and above";
run;

ballardw
Super User

One last thing (that a format would address if done correctly) is that you do not address any of the values between 1.99 and 2.

You have <=1.99 and >=2 as boundary values.

 

You may find that the internal numeric representation of your numbers actually occur in that gap even if you don't expect them too because of numeric precision of decimal values with binary representation.

 

You should consider learning about If then/else as well.

FreelanceReinh
Jade | Level 19

@ballardw wrote:

You may find that the internal numeric representation of your numbers actually occur in that gap even if you don't expect them to because of numeric precision of decimal values with binary representation.


Examples:

861   data _null_;
862   a=2.3-0.3; b=0.6+0.7+0.7; c=5*0.398; d=139.3/70;
863   put (a--d) (=best32.);
864   if a<2 & b<2 & c>1.99 & d>1.99 then put 'Surprised?';
865   run;

a=2 b=2 c=1.99 d=1.99
Surprised?

 

mkeintz
PROC Star

@nmuse 

 

You've gotten some good advice already, addressing establishment of character variable length, use of formats, and risk of uncaptured gaps in your incidence variable ranges.  

 

I would add that your code, regardless of how you choose to code the value assignment statements, would also benefit from arrays. 

 

You can use two arrays, each with six elements, to reduce the size of your code, and risk of typing errors:

 

DATA WORK.IMPORT1 (drop=i);
  SET WORK.IMPORT;

  array incidence{6}       ARI_Guest_Incidence       ARI_Crew_Incidence
                           CLI_Guest_Incidence       CLI_Crew_Incidence
                           ILI_Guest_Incidence       ILI_Crew_Incidence ;

  array category {6} $11   ARI_Guest_group_category  ARI_Crew_group_category
                           CLI_Guest_group_category  CLI_Crew_group_category
                           ILI_Guest_group_category  ILI_Crew_group_category ;

  do i=1 to 6;
    if incidence{i}<1        then category{i}="<1";          else
    if 1<=incidence{i}<2     then category{i}="1-1.99";      else   /*Modified per @Tom's comment*/
    if incidence{i}>=2       then category{i}="2 and above";
  end;

RUN;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

To avoid the problem of values between 1.99 and 2.00 you should probably use 

    if incidence{i}<1 then category{i}="<1 or missing"; 
    else if incidence{i}<2 then category{i}="1-1.99"; 
    else category{i}="2 and above";

instead.

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1955 views
  • 5 likes
  • 8 in conversation