DATA Step, Macro, Functions and more


Occasional Contributor
Posts: 8


Hello dear SAS community. 

I have a real struggle at the moment, and I'm running out of time. 
To make things short: I want for each year, rank all firms by X for that year. But only max "5 Ranks" fo reach year, and start in year 1968 for example, and repeat it each year.
For example: 
1000    1965     9            3   4 (69)
1000    1966    10           2  3(69)
1000    1967    11           1 2(69)
1000    1968    12           1(69)
1000    1969    16           
1000    1970    19
1000    1971    20
1000    1972    21  




BUT the problem starts when I'm doing this in year 1969, since it contains a "rank" variable in pre 68's.

Trusted Advisor
Posts: 1,337

Re: Ranking

Posted in reply to TorTheHammer

What precisely does the WANT value  [  3 4 (60) ] mean in the data line for 1965.


1000    1965     9            3   4 (69)



It appears you want, for each year, to

  1.  Rank the preceding 4 ( preceding 5?) years.  I notice for the "(69)" you don't actually assign a rank value to the 1969 data.
  2. Assign those ranks to those preceding years.





This means you would want, for each year t, 4 RANK vars  

  1. RANK_0 = rank of X(t) among  x{t-3}, x{t-2}, x{t-1}, x{t}
  2. RANK_1 = rank of X{t} among  x{t-2} x{t-1}, x{t}, x{t+1}
  3. ...
  4. RANK_3 = rank of x{t} among  x{t}, t{t+1}, x{t+2}, x{t+3};



Is that what you mean?

Trusted Advisor
Posts: 1,337

Re: Ranking

Also, what rank do you want to assign when there are ties in the x values?

Occasional Contributor
Posts: 8

Re: Ranking

Yes, this is what I want. Sorry for bad "writing" about it Smiley Happy 

Sorry for asking rude, but can you help me about it?Smiley Happy

Super User
Posts: 13,542

Re: Ranking

Posted in reply to TorTheHammer

I think it may help to provide a little more raw input data and how you want the result at the end of the process. You say " for each year" but your example only has one year of ranks (apparently).


Also please explain where you got your "want" values from. If 3 is supposed to be Rank, how did you get 3? The value of 9 for VAR_x is the smallest shown so I would expect it to be either 1 (if you want to rank or increasing values of var_x) or not ranked if looking at decreaseing values. And what is 4?

How do you get 2 values of 1? Are you perhaps missing a bunch of different variables to hold multiple periods of consideration but didn't show them?

Trusted Advisor
Posts: 1,337

Re: Ranking

[ Edited ]
Posted in reply to TorTheHammer



The DATA WANT step below does what I think you want.  If there are ties among the X's being ranked, then it produces the average of the lowest and highest rank assigned to the X value:


It works by:

  1. Reading in each record for a gien FIRMID and storing X in an array, indexed on year, with a lowest bound 1960 and highest bound of 2010.  This assumes that the lowest observable year is 1963 and the highest observed year is 2010.
  2. When the end of a firmid is encountered, then, in a loop, reread each year, and capture the rank of its X value versus (1) years YEAR-3 through YEAR for RANK_), (2) years YEAR-2 through YEAR+1 for RANK_1, etc.


data have;
  input FirmID YEAR  X;
  firmid=2000; if mod(_n_,2)=0 then x=x-2; output;
1000    1965     9
1000    1966    10
1000    1967    11
1000    1968    12
1000    1969    16           
1000    1970    19
1000    1971    20
1000    1972    21  
proc sort; by firmid year;run;

data want (drop=_:);
  array _x{1960:2010} _temporary_;
  set have (keep=firmid year x );
  by firmid;
  if first.firmid then call missing(of _x{*});
  _x{year}=x;   /* Fill the _X array */

  label rank_0 = "Rank of current X for Current Year-3 through Current Year"
        rank_1 = "Rank of current X for Current Year-2 through Current Year+1"
        rank_2 = "Rank of current X for Current Year-1 through Current Year+2"
        rank_3 = "Rank of current X for Current Year   through Current Year+3" ;
  array rnk{0:3} rank_0-rank_3;

  if last.firmid then do until (last.firmid);
    set have ;
    by firmid;
    do _r=0 to 3;
      do                     _rank_beg=1         by 1 until (largest(_rank_beg  ,_x{year+_r-3},_x{year+_r-2},_x{year+_r-1},_x{year+_r-0})=x);end;
      if _rank_beg<4 then do _rank_end=_rank_beg to 3 while (largest(_rank_end+1,_x{year+_r-3},_x{year+_r-2},_x{year+_r-1},_x{year+_r-0})=x);end;
      else _rank_end=4;





Editted addition.  To deal with possible ties for X, you see two consecutive loops

    DO _RANK_BEG= 1 by 1 until (.....);

    IF _RANK_BEG<4 then DO _RANK_END=_RANK_BEG to 3 WHILE (.....);


If you expect no ties in X, you could replace them with a single loop: 

  DO RNK{_r}=1 by 1 until (largest(rnk{_r},_x{year+_r-3},_x{year+_r-2},_x{year+_r-1},_x{year+_r-0})=x);




Ask a Question
Discussion stats
  • 5 replies
  • 3 in conversation