Ok, let me try this one last time.
I am trying to sort the data into lower half, third quartile, and top quartile. However, I can't get this to work. The output shows all the data listed from a ClassRank of 41 to 100, with the GPA for each ClassRank. What I want is to divide the 60 observations into groups 41-60, 61-80, and 81-100, with the average GPA shown for each group. How would I do this?
Below is my attempted way of doing this. At the bottom is the code for the dataset learn.college. Note that this is not a real data set, but a problem from an instructional book. Therefore, the ClassRank and GPA does not necessarily make sense in normal terms... i.e. a GPA of 4 would have a Lower ClassRank than someone with a GPA of 3.3. The data itself doesn't matter. How to organize the data into thirds with the average GPA calculated for each third is what matters. Thanks a lot!
---------------------------
Libname Learn '/folders/myfolders/Learn' ;
Libname Myformat '/folders/myfolders/sasuser.v94' ;
proc format library=myformat ;
value Rank 41-60 = 'Bottom Third''
61-81 = 'Middle Third'
81-100 = 'Top Third'' ;
run ;
/* %let NumGroups= 4 ;
proc rank data=learn.college Groups=&NumGroups ties=high ;
var ClassRank ;
Ranks QuartileRank ;
run ; */
proc means data=learn.college noprint nway;
Class ClassRank ;
Var GPA ;
Output Out=ClassRankSpecial ;
run ;
proc means data=ClassRankSpecial n mean maxdec=2 ;
Class ClassRank ;
Var GPA ;
format ClassRank Rank. ; /*DONT FORGET THIS. FOR A JUST FORMATTED FORMAT, THIS MUST BE INCLUDED*/
run ;
------------------------
BELOW IS THE DATASET LEARN.COLLEGE:
------------------------
data learn.college;
length StudentID $ 5 Gender SchoolSize $ 1;
do i = 1 to 100;
StudentID = put(round(ranuni(123456)*10000),z5.);
if ranuni(0) lt .4 then Gender = 'M';
else Gender = 'F';
if ranuni(0) lt .3 then SchoolSize = 'S';
else if ranuni(0) lt .7 then SchoolSize = 'M';
else SchoolSize = 'L';
if ranuni(0) lt .2 then Scholarship = 'Y';
else Scholarship = 'N';
GPA = round(rannor(0)*.5 + 3.5,.01);
if GPA gt 4 then GPA = 4;
ClassRank = int(ranuni(0)*60 + 41); /* the + 41 is there because of missing data (my note) */
if ranuni(0) lt .1 then call missing(ClassRank);
if ranuni(0) lt .05 then call missing(SchoolSize);
if ranuni(0) lt .05 then call missing(GPA);
output;
end;
format Gender $gender1.
SchoolSize $size.
Scholarship $yesno.;
drop i;
run;
Since we do not have your data you should 1) provide example input data, 2) the expected or desired output. A data set created with random values for variables means that we do not have the same data, can practically never get the same result and have no idea if we are getting the desired result or not.
You can use Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
It does not help to provide code with libraries we don't have (library) or formats without definitions (gender1, size yesno
Note that your rank assignment creates many duplicates with gaps so is problematic at best, especially given 100 students...
data work.college; length StudentID $ 5 Gender SchoolSize $ 1; do i = 1 to 100; StudentID = put(round(ranuni(123456)*10000),z5.); if ranuni(0) lt .4 then Gender = 'M'; else Gender = 'F'; if ranuni(0) lt .3 then SchoolSize = 'S'; else if ranuni(0) lt .7 then SchoolSize = 'M'; else SchoolSize = 'L'; if ranuni(0) lt .2 then Scholarship = 'Y'; else Scholarship = 'N'; GPA = round(rannor(0)*.5 + 3.5,.01); if GPA gt 4 then GPA = 4; if ranuni(0) lt .05 then call missing(SchoolSize); if ranuni(0) lt .05 then call missing(GPA); output; end; drop i; run; proc rank data=work.college Groups=100 ties=high out= work.ranked; var gpa ; Ranks PercentileRank ; run ; proc format library=work ; value Rank 41-60 = 'Bottom Third' 61-81 = 'Middle Third' 81-100 = 'Top Third' ; run ; Proc summary data=work.ranked nway ; where percentilerank ge 41; class percentilerank; var gpa; output out=work.classrankspecial mean=; format percentilerank rank.; run; /* or */ proc tabulate data=work.ranked; where percentilerank ge 41; class percentilerank; var gpa; table percentilerank='', gpa*mean=''; format percentilerank rank.; run;
I think you need to be using QuartileRanks in CLASS statement in PROC MEANS.
What is the purpose of running PROC MEANS twice?
If you just want to work on LEARN.COLLEGE, you can run PROC MEANS once on that data set. Use the syntax of your second PROC MEANS, but change the input data set to be LEARN.COLLEGE.
The way the program appears now, it looks like you want an average of averages. First average all the students with RANK=41, RANK=42, RANK=43, etc. Then take those averages and re-average them within your groupings. If that is your intent, you should add to the OUTPUT statement on the first PROC MEANS:
Output Out=ClassRankSpecial mean=;
You can see what I'm talking about by examining ClassRankSpecial using the current program, vs. using the modified OUTPUT statement.
I don't know what you're looking for in the output but this works fine for me. I think you're possibly running into trouble with the Formats and where it's stored and where its searching.
proc format ;
value Rank 41-60 = 'Bottom Third'
61-81 = 'Middle Third'
81-100 = 'Top Third' ;
run ;
data college;
length StudentID $ 5 Gender SchoolSize $ 1;
do i = 1 to 100;
StudentID = put(round(ranuni(123456)*10000),z5.);
if ranuni(0) lt .4 then Gender = 'M';
else Gender = 'F';
if ranuni(0) lt .3 then SchoolSize = 'S';
else if ranuni(0) lt .7 then SchoolSize = 'M';
else SchoolSize = 'L';
if ranuni(0) lt .2 then Scholarship = 'Y';
else Scholarship = 'N';
GPA = round(rannor(0)*.5 + 3.5,.01);
if GPA gt 4 then GPA = 4;
ClassRank = int(ranuni(0)*60 + 41); /* the + 41 is there because of missing data (my note) */
if ranuni(0) lt .1 then call missing(ClassRank);
if ranuni(0) lt .05 then call missing(SchoolSize);
if ranuni(0) lt .05 then call missing(GPA);
output;
end;
format classrank rank.;
drop i;
run;
proc means data=college n mean maxdec=2 nway ;
class ClassRank;
Var GPA ;
*format ClassRank Rank. ; /*DONT FORGET THIS. FOR A JUST FORMATTED FORMAT, THIS MUST BE INCLUDED*/
/*NOPE, IT CAN BE ASSIGNED AHEAD OF TIME, IT DOESNT NEED TO BE INCLUDED IN THE PROC MEANS*/
output out=want n= mean= / autoname;
run ;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.