BookmarkSubscribeRSS Feed
lucky66
Calcite | Level 5
define new variable B requiring that the top 2 frequent values for variable1 to what they used be, the others are defined as OTHER.( Suppose A,B are the two, then variableB=A if variable B=A) Since as more observations will be added soon, we do not know which two values for variable1 will be the most frequent; therefore, we need a general defined method with base SAS.
ID Loss Variable1
1 1000 A
2 2000 A
3 300 B
4 40000 C
5 2300 E
....
Anyone can help me? Thanks.
5 REPLIES 5
art297
Opal | Level 21
While I haven't done a lot with views (shame on me), I would think that you could submit a small proc sql program that created a view containing the specifics that you outlined. That way, any time you called the file, it would have the most recent top two values.

Art
arpit
Calcite | Level 5
Dear Art ,

Can explain the query.

Regards,
Arpit
lucky66
Calcite | Level 5
I also want to know. Thanks.
Peter_C
Rhodochrosite | Level 12
it looks to me like the PROC MEANS feature IDGROUP should be able to easily deliver the "top-two" most conveniently. However, leaving that sophistication to those specialists, here is a bit of basic sql;
First load your data[pre]data your.data;
input iD Loss Variable1 $ ; list;cards;
1 1000 A
2 2000 A
3 300 B
4 40000 C
5 2300 E
;[/pre]then identify the most frequent cases (I want to put the top two variable1 values into macro variables, so am using SQL[pre]proc sql noprint ;
select variable1 into :v1-:v9999
from ( select variable1, count(*) as cases from your.data
group by variable1
)
order by cases descending
;
quit ;[/pre]* then a character informat which allows use of _SAME_, leaving top-two unchanged;[pre]proc format ;
invalue $top_two "&v1", "&v2" = _same_
other = 'OTHER' ;
run;[/pre]* then use this informat to derive the new column in a data step view;[pre]
data v /view=v ;
set your.data ;
simpler = input( variable1, $top_two. ) ;
run ;[/pre] * now run a print to show the results;[pre]option nocenter ls=64 ;
proc print ;
title 'informat applied';
run;
informat applied

Obs iD Loss Variable1 simpler

1 1 1000 A A
2 2 2000 A A
3 3 300 B OTHER
4 4 40000 C C
5 5 2300 E OTHER[/pre]*Of course, it might be simpler just to create a top_two format for use in analysing the data - without creating the view or re-writing the data!
Adapt the proc format only slightly ;[pre]proc format ;
value $top_two "&v1" = "&v1"
"&v2" = "&v2" /* unable to use _SAME_ in value statement*/
other = 'OTHER' ;
run;[/pre]* now using the top_two format in proc means and print to show results;[pre]proc means data= your.data nway ;
format variable1 $top_two. ;
class variable1 ;
var loss ;
output sum= out= summary ;
run;
proc print ;
title 'format summary demo';
run;
format summary demo

Obs Variable1 _TYPE_ _FREQ_ Loss

1 A 1 2 3000
2 OTHER 1 2 2600
3 C 1 1 40000[/pre] Think I prefer the format route.
peterC
lucky66
Calcite | Level 5
Peter, thank you so much.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 823 views
  • 0 likes
  • 4 in conversation