Hello SAS community! These posts have been very helpful in the past, so I'm hoping some may have solutions for my task.
Data: one variable ranging from 0.000 - 5.000
Task: create groups based on ranges of values
For example:
Values <= 0 to <= 0.039 = group 1
Values >= 0.04 to <=0.0.79 = group 2
Values >=0.08 to <=0.119 = group 3, etc
...all the way up to 5.000 (this should result in 128 groups).
I have over 10 million observations and about 900 datasets to process this on, so I would like it to be fairly quick too.
Can anyone help? Thanks in advance!
Possibly a slight modification of @FreelanceReinh's code:
group = floor(x/(5/128))+1;
When I iterate at 0.01 between 0 and 5 that generates 128 groups of 4 values each
I should mention that the goal is to avoid having to code in 128 if-then statements.
Use a format, and you can build your format by using a formula.
data range_fmt;
fmtname='Range_fmt';
type='N';
EEXCL='Y'; *exclude end value of range;
label=0; *group value;
do i=0 to 5 by 0.04;
start=i;
end=i+0.04;
label+1;
output;
end;
drop i;
run;
proc format cntlin=range_fmt;
run;
Thank you. However, the end of label 1 is the same as the start of label 2. I'd like label 1 to be 0 <= 0.039 and label 2 to be 0.04 <= 0.079. Also, how would you match up the format with the original data set?
In my opinion <0.04 is the same as <=0.039. The EEXCL option included above, states that 0.04 is NOT included in the boundary condition.
If you prefer to have a new variable group with values 1, 2, 3, etc., you could define it as follows (for a variable X containing the values to be categorized):
data want;
set have;
group=floor(x/0.04)+1;
run;
Whether you create a new variable or a format (as suggested by Reeza) or a function (with PROC FCMP), you should pay particular attention to the correct assignment of values at the boundary between two ranges. Otherwise, numeric representation issues could lead to surprising group assignments.
Your outlined definition is not quite clear about the categorization of values x with 0.039 < x < 0.04 etc. (if any). Also, how do you know there will be 128 groups? Continuing your definition I obtain 5/0.04=125 plus 1 group for value 5.000 alone.
Here's what I meant by "surprising group assignments":
/* Create test data */
data have;
do _n_=0 to 5000;
x=_n_/1000;
output;
end;
run; /* 5001 obs.: x=0, 0.001, 0.002, ..., 4.999, 5 */
/* Compare the two group definitions that have been suggested so far */
data chk;
set have;
group=floor(x/0.04)+1;
groupf=input(put(x, range_fmt.), 8.);
if group ne groupf;
run; /* 56 obs.! */
proc print data=chk(obs=10) noobs;
run;
Result (with SAS 9.4 on Windows 7):
x group groupf 0.44 12 11 0.52 14 13 0.76 20 19 0.84 22 21 0.92 24 23 1.00 26 25 1.08 28 27 1.16 30 29 1.24 32 31 1.32 34 33
All values are only to the thousandths, so there are no values between 0.039 and 0.040. I'd like the range to be 0.039...and 5/0.039 is about 128. Another way of explaining would be that I want values from 0.000 to 5.000 split into 128 groups of equal range.
Possibly a slight modification of @FreelanceReinh's code:
group = floor(x/(5/128))+1;
When I iterate at 0.01 between 0 and 5 that generates 128 groups of 4 values each
@jkurka wrote:
All values are only to the thousandths, so there are no values between 0.039 and 0.040. I'd like the range to be 0.039...and 5/0.039 is about 128. Another way of explaining would be that I want values from 0.000 to 5.000 split into 128 groups of equal range.
It's true that 5/0.039=128.2..., but due to the structural gaps (like 0.039 to 0.04) between your intervals the number of ranges will not exceed 126 with the first two definitions that had been discussed.
A definition of "128 groups of equal range," however, would require a different definition. Please note that 5001 is not divisible by 128 (5001/128=39.07...). So, if you assign each of the 5001 values 0, 0.001, 0.002, ..., 4.999, 5 to 128 ranges, the numbers of elements per range will at best vary between 39 and 40.
Suggestion: Use 5/128=0.0390625 as the interval length, right-open intervals, but assign 5 to group 128 (so as to avoid a 129th group for value 5 alone). This can be achieved by applying ballardw's suggestion to values x with 0<=x<5 and classifying 5 into group 128.
This definition creates 119 categories with 39 values and 9 with 40 values.
My advice is to use the "cntlin" option that @Reeza posted. Use the "EEXCL" option to prevent the end of your range including the number.
Tom
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.