it looks to me like the PROC MEANS feature IDGROUP should be able to easily deliver the "top-two" most conveniently. However, leaving that sophistication to those specialists, here is a bit of basic sql;
First load your data[pre]data your.data;
input iD Loss Variable1 $ ; list;cards;
1 1000 A
2 2000 A
3 300 B
4 40000 C
5 2300 E
;[/pre]then identify the most frequent cases (I want to put the top two variable1 values into macro variables, so am using SQL[pre]proc sql noprint ;
select variable1 into :v1-:v9999
from ( select variable1, count(*) as cases from your.data
group by variable1
)
order by cases descending
;
quit ;[/pre]* then a character informat which allows use of _SAME_, leaving top-two unchanged;[pre]proc format ;
invalue $top_two "&v1", "&v2" = _same_
other = 'OTHER' ;
run;[/pre]* then use this informat to derive the new column in a data step view;[pre]
data v /view=v ;
set your.data ;
simpler = input( variable1, $top_two. ) ;
run ;[/pre] * now run a print to show the results;[pre]option nocenter ls=64 ;
proc print ;
title 'informat applied';
run;
informat applied
Obs iD Loss Variable1 simpler
1 1 1000 A A
2 2 2000 A A
3 3 300 B OTHER
4 4 40000 C C
5 5 2300 E OTHER[/pre]*Of course, it might be simpler just to create a top_two format for use in analysing the data - without creating the view or re-writing the data!
Adapt the proc format only slightly ;[pre]proc format ;
value $top_two "&v1" = "&v1"
"&v2" = "&v2" /* unable to use _SAME_ in value statement*/
other = 'OTHER' ;
run;[/pre]* now using the top_two format in proc means and print to show results;[pre]proc means data= your.data nway ;
format variable1 $top_two. ;
class variable1 ;
var loss ;
output sum= out= summary ;
run;
proc print ;
title 'format summary demo';
run;
format summary demo
Obs Variable1 _TYPE_ _FREQ_ Loss
1 A 1 2 3000
2 OTHER 1 2 2600
3 C 1 1 40000[/pre] Think I prefer the format route.
peterC