Solved: Re: Variable wise nth highest using SAS

vijaysas777zx · Posted 10-07-2017 06:32 AM

I have a data like below and I want to calculate the nth highest number for the below variables.

USA AUS ENG

1000 2000 500

3000 1000 200

1500 1500 1000

2200 1100 1500

700 150 900

3000 1000 500

I tried ranks but I couldn't get the output the way required.

Patrick · Posted 10-07-2017 09:06 AM

@vijaysas777zx

Something like below should work

data have;
  input USA AUS ENG;
  datalines;
1000 2000 500
3000 1000 200
1500 1500 1000
2200 1100 1500
700 150 900
3000 1000 500
;
run;

proc rank data=have ties=dense out=ranked;
  var USA AUS ENG;
  ranks _rUSA _rAUS _rENG;
run;

data want(drop=_:);
  set ranked(where=(_rUSA=2));
  set ranked(where=(_rAUS=2));
  set ranked(where=(_rENG=2));
  output;
  stop;
run;

If you have a lot of countries then you could also transpose your source data to a long structure with only two columns (country, value) and then create the output using Proc Rank with By Group processing.

View solution in original post

PeterClemmensen · Posted 10-07-2017 06:34 AM

Do you want the n'th highest value for each variable or just for one?

The DATA to DATA Step Macro
Blog: SASnrd

vijaysas777zx · Posted 10-07-2017 06:42 AM

Each variable.

PeterClemmensen · Posted 10-07-2017 06:44 AM

And how do you require the output? 🙂

The DATA to DATA Step Macro
Blog: SASnrd

vijaysas777zx · Posted 10-07-2017 06:51 AM

lets say, I want the second highest of all the variables, The output should be below

USA AUS ENG
2200 1500 1000

Patrick · Posted 10-07-2017 09:06 AM

@vijaysas777zx

Something like below should work

data have;
  input USA AUS ENG;
  datalines;
1000 2000 500
3000 1000 200
1500 1500 1000
2200 1100 1500
700 150 900
3000 1000 500
;
run;

proc rank data=have ties=dense out=ranked;
  var USA AUS ENG;
  ranks _rUSA _rAUS _rENG;
run;

data want(drop=_:);
  set ranked(where=(_rUSA=2));
  set ranked(where=(_rAUS=2));
  set ranked(where=(_rENG=2));
  output;
  stop;
run;

If you have a lot of countries then you could also transpose your source data to a long structure with only two columns (country, value) and then create the output using Proc Rank with By Group processing.

Ksharp · Posted 10-08-2017 02:36 AM

data have;
  input USA AUS ENG;
  datalines;
1000 2000 500
3000 1000 200
1500 1500 1000
2200 1100 1500
700 150 900
3000 1000 500
;
run;
data want;
 set have end=last;
array u{999999} _temporary_;
array a{999999} _temporary_;
array e{999999} _temporary_;

if usa not in u then u{_n_}=usa;
if aus not in a then a{_n_}=aus;
if eng not in e then e{_n_}=eng;

if last then do;
 usa=largest(2,of u{*});
 aus=largest(2,of a{*});
 eng=largest(2,of e{*});
 output;
end;
run;

Patrick · Posted 10-08-2017 04:07 AM

@Ksharp

What about ties?

data test;
  array test {3} 8 (9,9,1);
  want=largest(2,of test[*]);
run;

Ksharp · Posted 10-09-2017 08:05 AM

@Patrick I have already considered about it. Check the code:

if usa not in u then u{_n_}=usa;

mkeintz · Posted 10-11-2017 02:18 PM

A little late in the game, but here is a single-pass solution, which gets the 4th highest values for HIGH LOW and CLOSE for IBM stocks in sashelp.stocks:

%let varlist=low high close;
%let NV=%sysfunc(countw(&varlist,%str( )));
%let ord=4;

%macro ordnals_list(size=);
  %local size;
  %do I=1 %to &size; ,ordnals{_v,&I} %end;
%mend;

options mprint;
data want (DROP=_v);
  array ordnals {&nv,&ord} _temporary_;
  array vars {&NV} &varlist;

  set sashelp.stocks (keep=stock low high close where=(stock='IBM'))  END=EOD;
  do _V=1 to dim(vars);
    if _n_<=&ord then ordnals{_V,_n_}=vars{_V};
    else if vars{_V}>ordnals{_V,1} then call sortn(vars{_V} %ordnals_list(size=&ord));
  end;

  if EOD;
  do _V=1 to dim(vars);
    vars{_V}=ordnals{_V,1};
  END;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Classroom Training Available!