BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Corinthian94
Obsidian | Level 7

Hi all,

I have a dataset with a variable measuring the amount of cigarettes individuals had per week at baseline. I did not help create this dataset so I'm not sure why it is coded as categorical rather than continuous given the amount of levels it has, but I would like to get quartiles from this group. As I understand it, proc rank or univariate will not work because it is a categorical variable.

The other caveat to this is that when I try to calculate it by hand and manually create a categorical variable, there are levels that contain more than observations than one quartile alone would contain. So if I were to try and do it by the values that are available, I would end up with uneven quartiles. Is there another way to do this? 

 

Thank you for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Convert it to numeric in a data step and then use proc rank. 

 

data numeric;
set have;

numeric_var = input(oldVar, 8.);

run;

proc rank data=numeric out=want_ranks groups=4;
var numeric_var;
ranks ranked_var;
run;

View solution in original post

8 REPLIES 8
Reeza
Super User
Is there some reason you can't use the original variable with the number of cigarettes?

Corinthian94
Obsidian | Level 7

Yes - we are creating a score based on multiple health behaviors that will be parsed into quintiles, so the goal is to parse this variable into quartiles and then have the last category of individuals who do not smoke be the last quintile. 

Reeza
Super User
Still not sure why you can't run PROC RANK on the original variable? Then you would use a second data step to add your 5th category. I hesitate to call it a quintile as the number may not reflect 20% of the data.

Or do you mean the variable is character type, not numeric which is why PROC RANK won't work?

Corinthian94
Obsidian | Level 7

Ye exactly! The original variable is character which is why proc rank won't work, and I'm not sure how to create the quartiles otherwise. 

Reeza
Super User

Convert it to numeric in a data step and then use proc rank. 

 

data numeric;
set have;

numeric_var = input(oldVar, 8.);

run;

proc rank data=numeric out=want_ranks groups=4;
var numeric_var;
ranks ranked_var;
run;
Reeza
Super User
A categorical variable is role of a variable and not something that SAS actually defines. It's more about how the variable is used.
There are variable types, SAS has two numeric and character.
There are variable formats, those control how a variable is displayed.

A categorical variable can be numeric or character.
A variable that requires numerical analysis, ie means, standard deviation etc must be numeric.

ballardw
Super User

Quartiles, or other percentiles, are order statistics. Which means you must be able to place them in some order such that the "first" is the "lowest value" and so. So for percentiles you have to assign that order value to a numeric variable to do percentiles.

Does your categorical variable have a natural order, such as "first" "second" "third", so that low to high is obvious? Then that step should be relatively easy.

If your category is something like the name of a street, such as "Elm" "Oak" "Main" "Front" "Jackson" there may not be any obvious order though geographic, such as east to west, might be possible. Where the eastern (or western) street comes "before" the others. Again, assign a numeric value to another variable to go with each name, possibly the number of blocks from a reference.

 

If there is nothing obvious for a natural order, then what would be the meaning of the quartile on what would be, essentially, a random order assignment?

pink_poodle
Barite | Level 11
To me that seems fine, although loosing some granularity compared to continuous variable. Say there were four people, and they smoked 1, 2, 3, 4 cigarettes, respectively, or, you can say, they smoked little, moderate, many, and very many cigarettes, that would still be four quartiles. However, if you define 2-3 cigarettes as moderate, now we cannot have four quartiles anymore with categories, because two people will belong to moderate category. So the key here is, before you create categories of the categorical variable, to look at the rank boundaries of sorted continuous variable, so you can partition the categories neatly and there are really four quartiles of more or less equal size.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3102 views
  • 0 likes
  • 4 in conversation