BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Paul_NYS
Obsidian | Level 7

Hi

I am not sure where this question belongs, but was wondering if there are any rules concerning how to categorize numbers that fall on numerical category boundaries? I inherited code where a consultant created the below data set that categories the whole numbers that are the values for the durmo_cat variable into categories that would place a number that equal to one of the boundary markers in the category above it. In other words if durmo_cat equals exactly 2 then this value would be classified as a 3.

Are these rules just created by the developer or is there something more involved?

Paul

data s1;

    set workpl;   

    ***duration intervals***;

  if durmo_cat lt 1 then d12=1;

  if 1 le durmo_cat lt 2 then d12=2;

  if 2 le durmo_cat lt 3 then d12=3;

  if 3 le durmo_cat lt 4 then d12=4;

  if 4 le durmo_cat lt 5 then d12=5;

  if 5 le durmo_cat lt 6 then d12=6;

  if 6 le durmo_cat lt 12 then d12=7;

  if 12 le durmo_cat lt 18 then d12=8;

  if 18 le durmo_cat lt 24 then d12=9;

  if 24 le durmo_cat lt 30 then d12=10;

  if 30 le durmo_cat lt 36 then d12=11;

  if 36 le durmo_cat lt 42 then d12=12;

  if 42 le durmo_cat lt 48 then d12=13;

  if 48 le durmo_cat lt 54 then d12=14;

  if 54 le durmo_cat lt 60 then d12=15;

  if 60 le durmo_cat lt 66 then d12=16;

  if 66 le durmo_cat lt 72 then d12=17;

  if durmo_cat ge 72 then d12=18;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

D12 is just a bin number, these is no reason its value should match that of DURMO_CAT.

Personnaly, I would take the bins' cut-off values out of the data step code by using a format:

D12 = put(DURMO_CAT, d12bin.);

This also makes D12 a string, which is also better imho, but you can always convert it back to a numeric if you really need one.

View solution in original post

6 REPLIES 6
PGStats
Opal | Level 21

I don't see anything wrong with this way of creating categories. As of d12 = 7 the category number becomes unrelated to the original durmo_cat value anyway. However, IF could be replaced by ELSE IF, except for the first one. That would be a bit more efficient. - PG

PG
aland1
Calcite | Level 5

I wouldn't say it becomes unrelated as of d12=7. It's no longer equal to the upper bound, but it's still related in the sense that the higher durmo_cat is, the higher d12 is.

ChrisNZ
Tourmaline | Level 20

D12 is just a bin number, these is no reason its value should match that of DURMO_CAT.

Personnaly, I would take the bins' cut-off values out of the data step code by using a format:

D12 = put(DURMO_CAT, d12bin.);

This also makes D12 a string, which is also better imho, but you can always convert it back to a numeric if you really need one.

Paul_NYS
Obsidian | Level 7

Thank you both. But do you know of any statistical or other rules that provide guidance for situations when you are grouping whole number numerical values in whole number numerical categories when one of the values fall exactly on the category boundary?

For example, if I have a value of 3 and categories of 1-2, 3-4, 5-6. Is there any reason why 3 would not fall in 3-4?

Paul

Astounding
PROC Star

Previous posters already provided the right answers.  One thing to add ... this code takes bad data and groups it into category 1.  For example, negative numbers and missing values both fall into category 1.

Building upon Chris's suggestion, perhaps it would help you to think of the bins as being labeled "A", "B", "C" ... instead of 1, 2, 3 ...

They are ordered categories, and do not represent numerical amounts.

aland1
Calcite | Level 5

The consultant's code does not represent some convention or best practice. If I had to guess, I'd say there's another variable in the data that's related to durmo_cat, and this is the consultant's attempt to make that relationship linear or otherwise improve it for his purpose.   

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 954 views
  • 3 likes
  • 5 in conversation