Solved: Re: How to rank a variable is datastep without using PROC RANK?

MarkWik · Posted 10-11-2012 12:59 PM

Hey guys,

How to rank a variable without touching proc rank. I would like datastep logic in ranking a variable, For example if i have a dateset that has studentid, Grades( in percentile).it is kinda challenging to rank them when the students grades are in percentiles for the reason many students will fall into a percentile Should we bucket or group into category like for example students falling into 75th percentile? or use something to work around the TIES in ranks.

I prefer the solution in datastep. I know proc rank enough i guess. Thanks

Good day,

Mark

Haikuo · Posted 10-17-2012 11:10 PM

Similarly,

data want;

set have;

by studentid descending grade ;

order+1-order*first.studentid;

rank+ifn(grade=lag(grade),0,1)-rank*first.studentid;

run;

Haikuo

View solution in original post

ballardw · Posted 10-11-2012 06:47 PM

I think a small example of input and desired output might be helpful to answering your question.

Astounding · Posted 10-11-2012 06:55 PM

Here's one variation, where tied scores get the same rank:

proc sort data=have;

by descending score;

run;

data want;

set have;

by descending score;

if first.score then rank=_n_;

retain rank;

run;

If you are looking to assign buckets, rather than individual ranks, that can be done too but it's a little more complex. It requires using the total number of observations in the rank formula:

set have nobs=_nobs_;

So if this isn't exactly what you need, just clarify. Good luck.

MarkWik · Posted 10-14-2012 04:24 AM

Astounding,

Thank you for the response. I guess that would rank all student, for example if there are 300 students there would 300 ranks.

In my case, I have several grades for one student and for many students. I want to rank the grades obtained by each student. The purpose is know how well the student(he/she) has shown any improvment. So again for example, if one student id has 300 grades there would have to be 300 grades for each studentid and the same applies to all other student ids. So basically I am trying to rank the grades per studentid.

The input dataset i have is:

Studentid Date Grades

and The desired output shoud look like

Studentid Date Grades Ranked_grades
student1 mm/dd//yy 53 1

student1 mm/dd/yy 50 2

student1 - 40 3

student2 69 4

student2 70 3

student2 75 1

/*Ranked_grades- You would notice ties in the ranks for student1 in 3 rd rank and for student2 in first rank. So far this how my output data i managed. I want to remove the ties so where 3 follows previous 3, i want that as 4 and likewise for all scenarios where ties apply. Thanks*/

Patrick · Posted 10-14-2012 05:01 AM

Are you after something simple like below:

data have;
input StudentID $ Grade;
retain date 18000;
date+1;
format date date9.;
datalines;
1 53
1 50
1 40
1 40
2 69
2 70
2 75
2 75
;
run;

proc sort data=have;
by StudentID descending Grade date;
run;

data want;
set have;
by StudentID;
if first.StudentID then Ranked_Grade=0;
Ranked_Grade+1;
run;

Tom · Posted 10-14-2012 02:35 PM

You need to retain two variables. An ORDER variable that increased by one for each record within a student. And RANK which is set to ORDER when a new value of grade is hit.

proc sort ;

by studentid descending grade ;

run;

data want;

set grades;

by studentid descending grade ;

if first.studentid then order=0;

order+1;

if first.grade then rank=order;

retain order rank;

run;

Haikuo · Posted 10-17-2012 11:10 PM

Similarly,

data want;

set have;

by studentid descending grade ;

order+1-order*first.studentid;

rank+ifn(grade=lag(grade),0,1)-rank*first.studentid;

run;

Haikuo