I was asked to make a 5 categorical variable and was able to find another forum that answered questions about making one, however, am having trouble applying it to my code. The first question is the following:
1)
Use the UNIVARIATE procedure to find the quintile (5-level) cut-points for the variable WEIGHT.
Using these cut-points, create a 5-level categorical variable called WTQUINT in a dataset called
MOD4_1 created using the TEMP6 dataset we left off with earlier. Run a FREQ procedure on this
variable.;
The following is my code that I thought was ok:
PROC UNIVARIATE DATA=temp6;
VAR WEIGHT;
RUN;
DATA MOD4_1;
SET Temp6;
IF 0 < WEIGHT <= 135 THEN WTQUINT=0;
IF WEIGHT > 135 THEN WTQUINT=1;
RUN;
PROC FREQ;
TABLES WTQUINT;
RUN;
PROC UNIVARIATE DATA=MOD4_1;
VAR WEIGHT;
OUTPUT OUT=CUTPTS PCTLPTS= .20 .40 .60 .80 PCTLPRE = P_;
RUN;
PROC PRINT DATA= CUTPTS;
RUN;
The second question is the following:
*2. Use the RANK procedure to create a 5-level categorical variable called WTQUINT2 in a dataset called
MOD4_2 from the dataset MOD4_1.
How do I create a ranking system if my obs values are 94, 94, 95, 95? I'm guessing I messed up when making my 5 categorical variable. I took 1/5, 2/5, 3/5, 4/5 out of 100% to get .20, .40. .60, .80 in my code.
Use the GROUPS=x option in PROC RANK to create a variable with x levels based on percentiles.
Well no, you don't use X, you use the actual number of groups (or levels) you want.
Yes I can see that. I guess what I should clarify is that i'm getting blank results when I use the code:
PROC RANK DATA=mod4_2 OUT=TWO GROUPS=5;
VAR WEIGHT;
RANKS WTQUINT2;
RUN;
Because my obs values from the first part of the question is
P_0_2= 94
P_0_4= 94
P_0_6=95
P_0_8=95
So my ranking isnt working.
Because I had to do the following before ranking:
DATA MOD4_2;
SET MOD4_1;
IF WEIGHT > 0 AND WEIGHT <= 94 THEN WTQUINT2=0;
IF WEIGHT > 94 AND WEIGHT <= 95 THEN WTQUINT2=1;
RUN;
I think they are asking you to use PROC RANK on your original dataset.
So no need for you to manually try to create the 5 categories, let PROC RANK do that for you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.