Hi all,
I have a dataset with a variable measuring the amount of cigarettes individuals had per week at baseline. I did not help create this dataset so I'm not sure why it is coded as categorical rather than continuous given the amount of levels it has, but I would like to get quartiles from this group. As I understand it, proc rank or univariate will not work because it is a categorical variable.
The other caveat to this is that when I try to calculate it by hand and manually create a categorical variable, there are levels that contain more than observations than one quartile alone would contain. So if I were to try and do it by the values that are available, I would end up with uneven quartiles. Is there another way to do this?
Thank you for your help!
Convert it to numeric in a data step and then use proc rank.
data numeric;
set have;
numeric_var = input(oldVar, 8.);
run;
proc rank data=numeric out=want_ranks groups=4;
var numeric_var;
ranks ranked_var;
run;
Yes - we are creating a score based on multiple health behaviors that will be parsed into quintiles, so the goal is to parse this variable into quartiles and then have the last category of individuals who do not smoke be the last quintile.
Ye exactly! The original variable is character which is why proc rank won't work, and I'm not sure how to create the quartiles otherwise.
Convert it to numeric in a data step and then use proc rank.
data numeric;
set have;
numeric_var = input(oldVar, 8.);
run;
proc rank data=numeric out=want_ranks groups=4;
var numeric_var;
ranks ranked_var;
run;
Quartiles, or other percentiles, are order statistics. Which means you must be able to place them in some order such that the "first" is the "lowest value" and so. So for percentiles you have to assign that order value to a numeric variable to do percentiles.
Does your categorical variable have a natural order, such as "first" "second" "third", so that low to high is obvious? Then that step should be relatively easy.
If your category is something like the name of a street, such as "Elm" "Oak" "Main" "Front" "Jackson" there may not be any obvious order though geographic, such as east to west, might be possible. Where the eastern (or western) street comes "before" the others. Again, assign a numeric value to another variable to go with each name, possibly the number of blocks from a reference.
If there is nothing obvious for a natural order, then what would be the meaning of the quartile on what would be, essentially, a random order assignment?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.