turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- How to categorize data points that fall on a categ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-09-2013 05:40 PM

Hi

I am not sure where this question belongs, but was wondering if there are any rules concerning how to categorize numbers that fall on numerical category boundaries? I inherited code where a consultant created the below data set that categories the whole numbers that are the values for the durmo_cat variable into categories that would place a number that equal to one of the boundary markers in the category above it. In other words if durmo_cat equals exactly 2 then this value would be classified as a 3.

Are these rules just created by the developer or is there something more involved?

Paul

data s1;

set workpl;

***duration intervals***;

if durmo_cat lt 1 then d12=1;

if 1 le durmo_cat lt 2 then d12=2;

if 2 le durmo_cat lt 3 then d12=3;

if 3 le durmo_cat lt 4 then d12=4;

if 4 le durmo_cat lt 5 then d12=5;

if 5 le durmo_cat lt 6 then d12=6;

if 6 le durmo_cat lt 12 then d12=7;

if 12 le durmo_cat lt 18 then d12=8;

if 18 le durmo_cat lt 24 then d12=9;

if 24 le durmo_cat lt 30 then d12=10;

if 30 le durmo_cat lt 36 then d12=11;

if 36 le durmo_cat lt 42 then d12=12;

if 42 le durmo_cat lt 48 then d12=13;

if 48 le durmo_cat lt 54 then d12=14;

if 54 le durmo_cat lt 60 then d12=15;

if 60 le durmo_cat lt 66 then d12=16;

if 66 le durmo_cat lt 72 then d12=17;

if durmo_cat ge 72 then d12=18;

run;

Accepted Solutions

Solution

06-10-2013
06:44 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

06-10-2013 06:44 AM

D12 is just a bin number, these is no reason its value should match that of DURMO_CAT.

Personnaly, I would take the bins' cut-off values out of the data step code by using a format:

D12 = put(DURMO_CAT, d12bin.);

This also makes D12 a string, which is also better imho, but you can always convert it back to a numeric if you really need one.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

06-09-2013 09:47 PM

I don't see anything wrong with this way of creating categories. As of d12 = 7 the category number becomes unrelated to the original durmo_cat value anyway. However, IF could be replaced by ELSE IF, except for the first one. That would be a bit more efficient. - PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

06-10-2013 10:21 AM

I wouldn't say it becomes unrelated as of d12=7. It's no longer equal to the upper bound, but it's still related in the sense that the higher durmo_cat is, the higher d12 is.

Solution

06-10-2013
06:44 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

06-10-2013 06:44 AM

D12 is just a bin number, these is no reason its value should match that of DURMO_CAT.

Personnaly, I would take the bins' cut-off values out of the data step code by using a format:

D12 = put(DURMO_CAT, d12bin.);

This also makes D12 a string, which is also better imho, but you can always convert it back to a numeric if you really need one.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

06-10-2013 08:32 AM

Thank you both. But do you know of any statistical or other rules that provide guidance for situations when you are grouping whole number numerical values in whole number numerical categories when one of the values fall exactly on the category boundary?

For example, if I have a value of 3 and categories of 1-2, 3-4, 5-6. Is there any reason why 3 would not fall in 3-4?

Paul

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

06-10-2013 08:53 AM

Previous posters already provided the right answers. One thing to add ... this code takes bad data and groups it into category 1. For example, negative numbers and missing values both fall into category 1.

Building upon Chris's suggestion, perhaps it would help you to think of the bins as being labeled "A", "B", "C" ... instead of 1, 2, 3 ...

They are ordered categories, and do not represent numerical amounts.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

06-10-2013 10:19 AM

The consultant's code does not represent some convention or best practice. If I had to guess, I'd say there's another variable in the data that's related to durmo_cat, and this is the consultant's attempt to make that relationship linear or otherwise improve it for his purpose.