turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- How to create groups based on the range of values

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 10:44 AM

Hello SAS community! These posts have been very helpful in the past, so I'm hoping some may have solutions for my task.

Data: one variable ranging from 0.000 - 5.000

Task: create groups based on ranges of values

For example:

Values <= 0 to <= 0.039 = group 1

Values >= 0.04 to <=0.0.79 = group 2

Values >=0.08 to <=0.119 = group 3, etc

...all the way up to 5.000 (this should result in 128 groups).

I have over 10 million observations and about 900 datasets to process this on, so I would like it to be fairly quick too.

Can anyone help? Thanks in advance!

Accepted Solutions

Solution

02-23-2016
04:25 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 01:38 PM

Possibly a slight modification of @FreelanceReinhard's code:

group = floor(x/(5/128))+1;

When I iterate at 0.01 between 0 and 5 that generates 128 groups of 4 values each

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 10:54 AM

I should mention that the goal is to avoid having to code in 128 if-then statements.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 11:10 AM

Use a format, and you can build your format by using a formula.

```
data range_fmt;
fmtname='Range_fmt';
type='N';
EEXCL='Y'; *exclude end value of range;
label=0; *group value;
do i=0 to 5 by 0.04;
start=i;
end=i+0.04;
label+1;
output;
end;
drop i;
run;
proc format cntlin=range_fmt;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 12:48 PM

Thank you. However, the end of label 1 is the same as the start of label 2. I'd like label 1 to be 0 <= 0.039 and label 2 to be 0.04 <= 0.079. Also, how would you match up the format with the original data set?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-20-2016 11:49 AM

In my opinion <0.04 is the same as <=0.039. The EEXCL option included above, states that 0.04 is NOT included in the boundary condition.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 11:47 AM

If you prefer to have a new variable group with values 1, 2, 3, etc., you could define it as follows (for a variable X containing the values to be categorized):

```
data want;
set have;
group=floor(x/0.04)+1;
run;
```

Whether you create a new variable or a format (as suggested by Reeza) or a function (with PROC FCMP), you should pay particular attention to the correct assignment of values at the boundary between two ranges. Otherwise, numeric representation issues could lead to surprising group assignments.

Your outlined definition is not quite clear about the categorization of values x with 0.039 < x < 0.04 etc. (if any). Also, how do you know there will be 128 groups? Continuing your definition I obtain 5/0.04=125 plus 1 group for value 5.000 alone.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 12:13 PM

Here's what I meant by "surprising group assignments":

```
/* Create test data */
data have;
do _n_=0 to 5000;
x=_n_/1000;
output;
end;
run; /* 5001 obs.: x=0, 0.001, 0.002, ..., 4.999, 5 */
/* Compare the two group definitions that have been suggested so far */
data chk;
set have;
group=floor(x/0.04)+1;
groupf=input(put(x, range_fmt.), 8.);
if group ne groupf;
run; /* 56 obs.! */
proc print data=chk(obs=10) noobs;
run;
```

Result (with SAS 9.4 on Windows 7):

x group groupf 0.44 12 11 0.52 14 13 0.76 20 19 0.84 22 21 0.92 24 23 1.00 26 25 1.08 28 27 1.16 30 29 1.24 32 31 1.32 34 33

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 12:46 PM

All values are only to the thousandths, so there are no values between 0.039 and 0.040. I'd like the range to be 0.039...and 5/0.039 is about 128. Another way of explaining would be that I want values from 0.000 to 5.000 split into 128 groups of equal range.

Solution

02-23-2016
04:25 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 01:38 PM

Possibly a slight modification of @FreelanceReinhard's code:

group = floor(x/(5/128))+1;

When I iterate at 0.01 between 0 and 5 that generates 128 groups of 4 values each

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 02:26 PM - edited 02-21-2016 08:41 AM

jkurka wrote:

It's true that 5/0.039=128.2..., but due to the structural gaps (like 0.039 to 0.04) between your intervals the number of ranges will not exceed 126 with the first two definitions that had been discussed.

A definition of "128 groups of equal range," however, would require a different definition. Please note that 5001 is not divisible by 128 (5001/128=39.07...). So, if you assign each of the 5001 values 0, 0.001, 0.002, ..., 4.999, 5 to 128 ranges, the numbers of elements per range will at best vary between 39 and 40.

Suggestion: Use 5/128=0.0390625 as the interval length, right-open intervals, but assign 5 to group 128 (so as to avoid a 129th group for value 5 alone). This can be achieved by applying ballardw's suggestion to values x with 0<=x<5 and classifying 5 into group 128.

This definition creates 119 categories with 39 values and 9 with 40 values.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-20-2016 11:38 AM

My advice is to use the "cntlin" option that @Reeza posted. Use the "EEXCL" option to prevent the end of your range including the number.

Tom