I have a homework and I am being tasked with creating "sex specific quartiles". Here are the exact instructions
Create sex-specific quartiles for the variables “zdelta_muac” and “zdelta_bmi” (using the “visitdata4” dataset) and call the new ranked variables “q4s_zdelta_muac” and “q4s_zdelta_bmi.” Output these results into a new temporary dataset called “visitdata5.”
where zdelta_muac and zdelta_bmi are standardized variables.
/*sex-specific quartiles*/
proc sort data=visitdata4;
by sex;
run;
proc rank data=visitdata4 out=visitdata5;
by sex;
var zdelta_muac zdelta_bmi;
ranks q4s_zdelta_muac q4s_zdelta_bmi;
run;
This is the code I have, but when I run this, it deletes a bunch of observations from visitdata4. How do I make this not happen
To do quartiles I would expect to see the option GROUPS=4 on the Proc Rank.
Do you have missing values for your BY variables?
Normally Proc Rank doesn't remove observations. Can you show us the LOG where you use Proc Rank to create Visitdata5?
Copy the text from the Log with the code and all the notes or messages. On the forum open a text box using the </> icon above the message window and paste all the text.
That would look something like this (your variables and sets differ of course)
3694 proc rank data=work.class groups=4 out=work.classrank; 3695 by sex; 3696 var height weight; 3697 ranks h_rank w_rank; 3698 run; NOTE: There were 19 observations read from the data set WORK.CLASS. NOTE: The data set WORK.CLASSRANK has 19 observations and 7 variables. NOTE: PROCEDURE RANK used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
I forgot the groups = 4 and added that in. Here is a snippet of what my visitdata4 looks like.
Then I run this code:
/*sex-specific quartiles*/
proc sort data=visitdata4;
by sex;
run;
proc rank data=visitdata4 out=visitdata5 groups =4 ;
by sex;
var zdelta_muac zdelta_bmi;
ranks q4s_zdelta_muac q4s_zdelta_bmi;
run;
and it now visitdata 4 looks like this:
Here is the log for the proc rank:
Your log bit
NOTE: There were 197 observations read from the data set WORK.VISITDATA4. NOTE: The data set WORK.VISITDATA5 has 197 observations and 25 variables.
Confirms than no observations were deleted.
If you are showing Visitdata4 (visitdata5 would look the same), it is showing the SORT order from your sex variable. Sex is missing for 5 observations. The places where sex is missing seems to be associated with missing other variables as well.
I bet a small pile of $$$ that if you run something like this (or with any of your Visitdata1, Visitdata2 guessing these exist) then you will see that Sex is missing for 5 observations.
proc freq data=visitdata3; tables sex / missing; run;
The ranks for observations where the By variable is missing will be missing as well.
This makes sense now! Thank you for the explanation I appreciate it!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.