I am trying to create 3 groups with the code down below but SAS is only including numbers under the first group (<1). Why is this happening?
DATA WORK.IMPORT1;
SET WORK.IMPORT;
/* Calculate the group category based on incidence rate ranges */
IF ARI_Guest_Incidence<1 THEN ARI_Guest_Group_Category = "<1";
IF 1<=ARI_Guest_Incidence<=1.99 THEN ARI_Guest_Group_Category = "1-1.99";
IF ARI_Guest_Incidence>=2 THEN ARI_Guest_Group_Category = "2 and above";
IF ARI_Crew_Incidence<1 THEN ARI_Crew_Group_Category = "<1";
IF 1<=ARI_Crew_Incidence<=1.99 THEN ARI_Crew_Group_Category = "1-1.99";
IF ARI_Crew_Incidence>=2 THEN ARI_Crew_Group_Category = "2 and above";
IF CLI_Guest_Incidence<1 THEN CLI_Guest_Group_Category = "<1";
IF 1<=CLI_Guest_Incidence<=1.99 THEN CLI_Guest_Group_Category = "1-1.99";
IF CLI_Guest_Incidence>=2 THEN CLI_Guest_Group_Category = "2 and above";
IF CLI_Crew_Incidence<1 THEN CLI_Crew_Group_Category = "<1";
IF 1<=CLI_Crew_Incidence<=1.99 THEN CLI_Crew_Group_Category = "1-1.99";
IF CLI_Crew_Incidence>=2 THEN CLI_Crew_Group_Category = "2 and above";
IF ILI_Guest_Incidence<1 THEN ILI_Guest_Group_Category = "<1";
IF 1<=ILI_Guest_Incidence<=1.99 THEN ILI_Guest_Group_Category = "1-1.99";
IF ILI_Guest_Incidence>=2 THEN ILI_Guest_Group_Category = "2 and above";
IF ILI_Crew_Incidence<1 THEN ILI_Crew_Group_Category = "<1";
IF 1<=ILI_Crew_Incidence<=1.99 THEN ILI_Crew_Group_Category = "1-1.99";
IF ILI_Crew_Incidence>=2 THEN ILI_Crew_Group_Category = "2 and above";
RUN;
.
After the SET statement, define the length of ARI_Guest_Group_Category to hold the longest value. For example:
Length ARI_Guest_Group_Category $ 7;
See Maxim 47.
Create a custom format with PROC FORMAT, then use this in simple assignments in PUT functions.
Depending on what you intend to do with the groups, you won't even need a new variable. SAS statistical procedures (FREQ, MEANS) use formatted values automatically to build groups.
If everything is ending up in the first category then the value of the source variable satisfies the first IF condition. Do you have missing values? SAS will treat missing values are less than any actual number.
To really see how your re-coding worked use PROC FREQ.
proc freq ;
tables
ARI_Guest_Incidence*ARI_Guest_Group_Category
ARI_Crew_Incidence*ARI_Crew_Group_Category
/ missing list ;
format _all_;
run;
The first value sets the width of the output, so it is good practice to set the output, a format or width statement will do. A second note, and for improved efficiency, add an else to the subsequent statements so that once the category is found the remainder of the if statements do not need to be completed.
I have added some data to work with.
data import;
format ARI_Guest_Incidence 8.3;
input ARI_Guest_Incidence;
cards;
1.2
0.1
3
;
run;
DATA WORK.IMPORT1;
SET WORK.IMPORT;
format ARI_Guest_Group_Category $15.;
/* Calculate the group category based on incidence rate ranges */
IF ARI_Guest_Incidence<1 THEN ARI_Guest_Group_Category = "<1";
else IF 1<=ARI_Guest_Incidence<=1.99 THEN ARI_Guest_Group_Category = "1-1.99";
else IF ARI_Guest_Incidence>=2 THEN ARI_Guest_Group_Category = "2 and above";
run;
One last thing (that a format would address if done correctly) is that you do not address any of the values between 1.99 and 2.
You have <=1.99 and >=2 as boundary values.
You may find that the internal numeric representation of your numbers actually occur in that gap even if you don't expect them too because of numeric precision of decimal values with binary representation.
You should consider learning about If then/else as well.
@ballardw wrote:
You may find that the internal numeric representation of your numbers actually occur in that gap even if you don't expect them to because of numeric precision of decimal values with binary representation.
Examples:
861 data _null_; 862 a=2.3-0.3; b=0.6+0.7+0.7; c=5*0.398; d=139.3/70; 863 put (a--d) (=best32.); 864 if a<2 & b<2 & c>1.99 & d>1.99 then put 'Surprised?'; 865 run; a=2 b=2 c=1.99 d=1.99 Surprised?
You've gotten some good advice already, addressing establishment of character variable length, use of formats, and risk of uncaptured gaps in your incidence variable ranges.
I would add that your code, regardless of how you choose to code the value assignment statements, would also benefit from arrays.
You can use two arrays, each with six elements, to reduce the size of your code, and risk of typing errors:
DATA WORK.IMPORT1 (drop=i);
SET WORK.IMPORT;
array incidence{6} ARI_Guest_Incidence ARI_Crew_Incidence
CLI_Guest_Incidence CLI_Crew_Incidence
ILI_Guest_Incidence ILI_Crew_Incidence ;
array category {6} $11 ARI_Guest_group_category ARI_Crew_group_category
CLI_Guest_group_category CLI_Crew_group_category
ILI_Guest_group_category ILI_Crew_group_category ;
do i=1 to 6;
if incidence{i}<1 then category{i}="<1"; else
if 1<=incidence{i}<2 then category{i}="1-1.99"; else /*Modified per @Tom's comment*/
if incidence{i}>=2 then category{i}="2 and above";
end;
RUN;
To avoid the problem of values between 1.99 and 2.00 you should probably use
if incidence{i}<1 then category{i}="<1 or missing";
else if incidence{i}<2 then category{i}="1-1.99";
else category{i}="2 and above";
instead.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.