Re: how to define the variable in general using base sas ?

lucky66 · Posted 01-07-2011 03:29 PM

define new variable B requiring that the top 2 frequent values for variable1 to what they used be, the others are defined as OTHER.( Suppose A,B are the two, then variableB=A if variable B=A) Since as more observations will be added soon, we do not know which two values for variable1 will be the most frequent; therefore, we need a general defined method with base SAS.
ID Loss Variable1
1 1000 A
2 2000 A
3 300 B
4 40000 C
5 2300 E
....
Anyone can help me? Thanks.

art297 · Posted 01-07-2011 07:56 PM

While I haven't done a lot with views (shame on me), I would think that you could submit a small proc sql program that created a view containing the specifics that you outlined. That way, any time you called the file, it would have the most recent top two values.

Art

arpit · Posted 01-08-2011 10:20 PM

Dear Art ,

Can explain the query.

Regards,
Arpit

lucky66 · Posted 01-09-2011 03:13 PM

I also want to know. Thanks.

Peter_C · Posted 01-10-2011 07:19 AM

it looks to me like the PROC MEANS feature IDGROUP should be able to easily deliver the "top-two" most conveniently. However, leaving that sophistication to those specialists, here is a bit of basic sql;
First load your data[pre]data your.data;
input iD Loss Variable1 $ ; list;cards;
1 1000 A
2 2000 A
3 300 B
4 40000 C
5 2300 E
;[/pre]then identify the most frequent cases (I want to put the top two variable1 values into macro variables, so am using SQL[pre]proc sql noprint ;
select variable1 into :v1-:v9999
from ( select variable1, count(*) as cases from your.data
group by variable1
)
order by cases descending
;
quit ;[/pre]* then a character informat which allows use of _SAME_, leaving top-two unchanged;[pre]proc format ;
invalue $top_two "&v1", "&v2" = _same_
other = 'OTHER' ;
run;[/pre]* then use this informat to derive the new column in a data step view;[pre]
data v /view=v ;
set your.data ;
simpler = input( variable1, $top_two. ) ;
run ;[/pre] * now run a print to show the results;[pre]option nocenter ls=64 ;
proc print ;
title 'informat applied';
run;
informat applied

Obs iD Loss Variable1 simpler

1 1 1000 A A
2 2 2000 A A
3 3 300 B OTHER
4 4 40000 C C
5 5 2300 E OTHER[/pre]*Of course, it might be simpler just to create a top_two format for use in analysing the data - without creating the view or re-writing the data!
Adapt the proc format only slightly ;[pre]proc format ;
value $top_two "&v1" = "&v1"
"&v2" = "&v2" /* unable to use _SAME_ in value statement*/
other = 'OTHER' ;
run;[/pre]* now using the top_two format in proc means and print to show results;[pre]proc means data= your.data nway ;
format variable1 $top_two. ;
class variable1 ;
var loss ;
output sum= out= summary ;
run;
proc print ;
title 'format summary demo';
run;
format summary demo

Obs Variable1 _TYPE_ _FREQ_ Loss

1 A 1 2 3000
2 OTHER 1 2 2600
3 C 1 1 40000[/pre] Think I prefer the format route.
peterC

lucky66 · Posted 01-24-2011 07:59 PM

Peter, thank you so much.

how to define the variable in general using base sas ?