Solved: Problem with PROC RANK

elif · Posted 04-06-2021 03:09 PM

Hello,

In the “allvar” dataset, variables divi, rd, and sin take values of either 0 or 1; variable divo takes values -1 or 0. I am using this data set to create portfolios for each date (newdatadate in my case).

I am using PROC RANK and group them into 5 before creating portfolios.

My Code:

OPTIONS MPRINT SYMBOLGEN;

%LET avars = divi divo rd sin ; * Note: (I have more than 60 variables here);

%LET n_avars=%SYSFUNC(countw(&avars));

%PUT &n_avars;

%LET grp = 5;

PROC SORT DATA = allvar;

BY newdatadate;

RUN;

PROC RANK DATA=allvar OUT=allvar_ranked group=&grp;

BY newdatadate; VAR &avars;

RUN;

I think, in the allvar_ranked dataset, variables divi, divo, rd, and sin should have only two different ranked values since they already had taken two values before they were ranked. However, it’s not the case.

When I run the following (before running proc rank):

proc sql; select distinct divi from allvar; quit; * Values: ., 0, 1 ;

proc sql; select distinct divo from allvar; quit; * Values: ., 0,-1 ;

proc sql; select distinct rd from allvar; quit; * Values: 0, 1 ;

proc sql; select distinct sin from allvar; quit; * Values: ., 0, 1 ;

When I run the same in the ranked dataset:

proc sql; select distinct divi from allvar_ranked; quit; * Ranked Values: ., 0, 2, 3, 4 ;

proc sql; select distinct divo from allvar_ranked; quit; * Ranked Values: ., 0, 1, 2, 3 ;

proc sql; select distinct rd from allvar_ranked; quit; * Ranked Values: 1, 2, 4 ;

proc sql; select distinct sin from allvar_ranked; quit; * Ranked Values: ., 2, 4 ;

I was using this command previously and have never encountered this problem.

How can I fix this?

Thank you!

elif · Posted 04-06-2021 04:37 PM

My goal is to create portfolios for each data date based on these rankings/groups so I will be able to compare the best group and the worst group for each variable.

Hence, for each date, I will assign 5 ranks to each variable separately.

I just wanted to handle the ranking procedure in one step, but I guess I should run proc rank for the rest of my variables with 5 groups.

Then, I can use proc rank again on the ranked dataset with two groups for only divi, divo, rd, and sin.

View solution in original post

PGStats · Posted 04-06-2021 03:20 PM

I don't get it. How would you want proc rank to generate 5 groups out of two values?

PG

elif · Posted 04-06-2021 03:25 PM

Hi,

I have other variables along with these four. These are the only variables that take two values.

ballardw · Posted 04-06-2021 04:11 PM

@elif wrote:

Hi,

I have other variables along with these four. These are the only variables that take two values.

I think that you may want to include your BY variable in the summary because that effects things;

Example of some data ranked with and without by variable.

data junk;
   do i= 1 to 50;
      byvar = mod(i,10);
      x= rand('integer',2) -1;
      y= rand('integer',2) -1;
      z= rand('integer',2) -2;
      r= rand('integer',2) -2;
      output;
   end;
run;

proc sort data=junk;
   by byvar;
run;

proc freq data=junk;
   title "Before ranking";
   tables x y z r;
run;

proc rank data=junk out=ranked1 group=5;
   var x y z r;
run;
proc freq data=ranked1;
   title "After ranking without by variable";
   tables x y z r;
run;

proc rank data=junk out=ranked2 group=5;
   by byvar;
   var x y z r;
run;

proc freq data=ranked2;
   title "After ranking with by variable";
   tables byvar *(x y z r) / list;
run;title;

Snap shot result: Without BY variable the ranks are two values for each variable.

With By variable the ranks vary across the entire data set BUT within each by group there are only two values but the actual rank values can vary for different values of the by variable.

From the same starting data.

PGStats · Posted 04-06-2021 04:11 PM

But proc rank treats each variable separately. It doesn't combine variables to create groups.

What is your goal? Maybe you should be looking for clustering procedures.

PG

elif · Posted 04-06-2021 04:37 PM

My goal is to create portfolios for each data date based on these rankings/groups so I will be able to compare the best group and the worst group for each variable.

Hence, for each date, I will assign 5 ranks to each variable separately.

I just wanted to handle the ranking procedure in one step, but I guess I should run proc rank for the rest of my variables with 5 groups.

Then, I can use proc rank again on the ranked dataset with two groups for only divi, divo, rd, and sin.

PGStats · Posted 04-06-2021 06:36 PM

Suppose that variable X is your performance criteria for determining which item (stock?) belongs to the best group and the worst group, seems to me that X is the variable that should be ranked

proc rank data=myData out=myRanks fraction;
by date;
var X;
ranks Xrank;
run;

then you could explore the relationship between Xrank and your explanatory variables. Proc hpsplit could help in that exploration, too determine which variable values are associated with low and high ranks.

hth

PG

elif · Posted 04-10-2021 01:42 PM

Update:

I have solved the problem after learning more about this procedure via your help.

Thank you so much!

Problem with PROC RANK

Re: Problem with PROC RANK

Re: Problem with PROC RANK

Re: Problem with PROC RANK

Re: Problem with PROC RANK

Re: Problem with PROC RANK

Re: Problem with PROC RANK

Re: Problem with PROC RANK

Re: Problem with PROC RANK

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!