Solved: Re: Getting the average amount of transaction for specifics event

Pontch · Posted 03-30-2012 04:02 PM

Hi guys,

I got a table with a lot of events.

table sample would look like

ItemID Event ID

ABC 100

ABC 200

ABC 300

CBA 100

CBA 200

CBA 300

For the same Item i can have a lot of different events associated, for one item i can have multiple time the same events.

What i would like to achieve is to get the average count of event in the whole population of items.

e.g a table that would look like

Event ID AverageAmountofTx

100 1

200 1.5

300 1

What would be the most efficient way to carry out that testing? Is there a direct way to do this using proc means?

Probably an easy way to do that but it's friday ,-) My table is huge so i would like to avoid creating an intermediary table with:

ItemIT EVENT ID Count

ABC 100 1

ABC 200 2

ABC 300 1

etc

Ksharp · Posted 03-30-2012 11:42 PM

How about:

data have;
input Item_ID $ Event_ID;
datalines;
ABC 100
ABC 200
ABC 200
ABC 300
CBA 100
CBA 200
CBA 300
;
run;
proc sql ;
create table want as
 select event_id,count(event_id)/count(distinct item_id) as avg
  from have
   group by event_id;
quit;

Ksharp

View solution in original post

akberali67 · Posted 03-30-2012 04:11 PM

PROC SUMMARY DATA = HAVE NWAY MISSING;

CLASS EVENTID ITEMID;

OUTPUT OUT = HAVE1;

RUN;

_FREQ_ WILL BE YOUR COUNT AND YOU SHOULD ALSO HAVE OTHER STATISTICS DISPLAYED

Hima · Posted 03-30-2012 04:33 PM

data have;
input Item_ID $ Event_ID;
datalines;
ABC 100
ABC 200
ABC 200
ABC 300
CBA 100
CBA 200
CBA 300
;
run;

proc sort data = have;
by Item_ID;
run;

proc freq data = have;
tables Item_ID*Event_ID / list missing norow nocol nocum nopercent;
run;

Output:

                               Item_ID    Event_ID    Frequency
                                ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
                                ABC             100           1
                                ABC             200           2
                                ABC             300           1
                                CBA             100           1
                                CBA             200           1
                                CBA             300           1

Pontch · Posted 03-30-2012 04:41 PM

Thanks guys for the quick answers, my question was more to know if you can avoid creating that table:

Item_ID Event_ID Frequency

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

ABC 100 1

ABC 200 2

ABC 300 1

CBA 100 1

CBA 200 1

CBA 300 1

I would like to create the following table directly without using an intermediary table (my source table is huge)

Event_ID AverageCount

100 1

200 1.5

300 1

akberali67 · Posted 03-30-2012 04:46 PM

didnt you get those required statiistics using my code for Proc summary??

Hima · Posted 03-30-2012 07:01 PM

data have;

input Item_ID $ Event_ID;

datalines;

ABC 100

ABC 200

ABC 300

CBA 100

CBA 200

CBA 300

;

run;

proc sql;

select Event_id,avg(cnt) as aver from

(select count(item_id) as cnt, event_id,item_id from have group by event_id, Item_id)

group by event_id;

quit;

Tested n working.

Ksharp · Posted 03-30-2012 11:42 PM

How about:

data have;
input Item_ID $ Event_ID;
datalines;
ABC 100
ABC 200
ABC 200
ABC 300
CBA 100
CBA 200
CBA 300
;
run;
proc sql ;
create table want as
 select event_id,count(event_id)/count(distinct item_id) as avg
  from have
   group by event_id;
quit;

Ksharp

Pontch · Posted 03-31-2012 04:29 PM

Thanks guys, will give it a try on Monday, the table will have over 1 billion records pretty soon, so will see how each query perform.

I was also thinking maybe i should just do the average on a given period of time and annualize it so that creating an intermediary table would be ok...

Will still give a try to all queries above will keep you guys updated.

PGStats · Posted 03-31-2012 05:01 PM

It depends if you want the average number of each event type in items in which they appear (which Ksharps' query above will calculate) or the average over all items. Compare :

data have;

input Item_ID $ Event_ID;

datalines;

ABC 100

ABC 200

ABC 300

CBA 100

CBA 200

;

proc sql ;

create table want1 as

select event_id,count(event_id)/count(distinct item_id) as avg

from have

group by event_id;

create table want2 as

select event_id,count(event_id)/(select count(distinct item_id) from have) as avg

from have

group by event_id;

select want1.*, want2.avg as avg2

from want1, want2 where want1.event_ID=want2.event_ID;

quit;

PG

Pontch · Posted 04-02-2012 03:23 PM

Awesome guys, great to have so many different answers!

Thanks for the additional information PG, i am using Ksharp query but yours just gave me more depth for my analysis

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!