Solved
Contributor
Posts: 25

How to create groups based on the range of values

Hello SAS community! These posts have been very helpful in the past, so I'm hoping some may have solutions for my task.

Data&colon; one variable ranging from 0.000 - 5.000

Task: create groups based on ranges of values

For example:

Values <= 0 to <= 0.039 = group 1

Values >= 0.04 to <=0.0.79 = group 2

Values >=0.08 to <=0.119 = group 3, etc

...all the way up to 5.000 (this should result in 128 groups).

I have over 10 million observations and about 900 datasets to process this on, so I would like it to be fairly quick too.

Can anyone help? Thanks in advance!

Accepted Solutions
Solution
‎02-23-2016 04:25 PM
Super User
Posts: 13,508

Re: How to create groups based on the range of values

Possibly a slight modification of @FreelanceReinhard's code:

group = floor(x/(5/128))+1;

When I iterate at 0.01 between 0 and 5 that generates 128 groups of 4 values each

All Replies
Contributor
Posts: 25

Re: How to create groups based on the range of values

I should mention that the goal is to avoid having to code in 128 if-then statements.

Super User
Posts: 23,683

Re: How to create groups based on the range of values

Use a format, and you can build your format by using a formula.

data range_fmt;
fmtname='Range_fmt';
type='N';
EEXCL='Y'; *exclude end value of range;
label=0; *group value;

do i=0 to 5 by 0.04;
start=i;
end=i+0.04;
label+1;
output;
end;

drop i;
run;

proc format cntlin=range_fmt;
run;
Contributor
Posts: 25

Re: How to create groups based on the range of values

Thank you. However, the end of label 1 is the same as the start of label 2. I'd like label 1 to be 0 <= 0.039 and label 2 to be 0.04 <= 0.079. Also, how would you match up the format with the original data set?

Super User
Posts: 23,683

Re: How to create groups based on the range of values

In my opinion <0.04 is the same as <=0.039. The EEXCL option included above, states that 0.04 is NOT included in the boundary condition.

Posts: 1,245

Re: How to create groups based on the range of values

If you prefer to have a new variable group with values 1, 2, 3, etc., you could define it as follows (for a variable X containing the values to be categorized):

data want;
set have;
group=floor(x/0.04)+1;
run;

Whether you create a new variable or a format (as suggested by Reeza) or a function (with PROC FCMP), you should pay particular attention to the correct assignment of values at the boundary between two ranges. Otherwise, numeric representation issues could lead to surprising group assignments.

Your outlined definition is not quite clear about the categorization of values x with 0.039 < x < 0.04 etc. (if any). Also, how do you know there will be 128 groups? Continuing your definition I obtain 5/0.04=125 plus 1 group for value 5.000 alone.

Posts: 1,245

Re: How to create groups based on the range of values

Here's what I meant by "surprising group assignments":

/* Create test data */
data have;
do _n_=0 to 5000;
x=_n_/1000;
output;
end;
run; /* 5001 obs.: x=0, 0.001, 0.002, ..., 4.999, 5 */

/* Compare the two group definitions that have been suggested so far */
data chk;
set have;
group=floor(x/0.04)+1;
groupf=input(put(x, range_fmt.), 8.);
if group ne groupf;
run; /* 56 obs.! */

proc print data=chk(obs=10) noobs;
run;

Result (with SAS 9.4 on Windows 7):

x     group    groupf

0.44      12       11
0.52      14       13
0.76      20       19
0.84      22       21
0.92      24       23
1.00      26       25
1.08      28       27
1.16      30       29
1.24      32       31
1.32      34       33
Contributor
Posts: 25

Re: How to create groups based on the range of values

All values are only to the thousandths, so there are no values between 0.039 and 0.040. I'd like the range to be 0.039...and 5/0.039 is about 128. Another way of explaining would be that I want values from 0.000 to 5.000 split into 128 groups of equal range.

Solution
‎02-23-2016 04:25 PM
Super User
Posts: 13,508

Re: How to create groups based on the range of values

Possibly a slight modification of @FreelanceReinhard's code:

group = floor(x/(5/128))+1;

When I iterate at 0.01 between 0 and 5 that generates 128 groups of 4 values each

Posts: 1,245

Re: How to create groups based on the range of values

[ Edited ]
jkurka wrote:

All values are only to the thousandths, so there are no values between 0.039 and 0.040. I'd like the range to be 0.039...and 5/0.039 is about 128. Another way of explaining would be that I want values from 0.000 to 5.000 split into 128 groups of equal range.

It's true that 5/0.039=128.2..., but due to the structural gaps (like 0.039 to 0.04) between your intervals the number of ranges will not exceed 126 with the first two definitions that had been discussed.

A definition of "128 groups of equal range," however, would require a different definition. Please note that 5001 is not divisible by 128 (5001/128=39.07...). So, if you assign each of the 5001 values 0, 0.001, 0.002, ..., 4.999, 5 to 128 ranges, the numbers of elements per range will at best vary between 39 and 40.

Suggestion: Use 5/128=0.0390625 as the interval length, right-open intervals, but assign 5 to group 128 (so as to avoid a 129th group for value 5 alone). This can be achieved by applying ballardw's suggestion to values x with 0<=x<5 and classifying 5 into group 128.

This definition creates 119 categories with 39 values and 9 with 40 values.

PROC Star
Posts: 1,307

Re: How to create groups based on the range of values

My advice is to use the "cntlin" option that @Reeza posted. Use the "EEXCL" option to prevent the end of your range including the number.

Tom

🔒 This topic is solved and locked.