Hi guys,
I got a table with a lot of events.
table sample would look like
ItemID Event ID
ABC 100
ABC 200
ABC 200
ABC 300
CBA 100
CBA 200
CBA 300
For the same Item i can have a lot of different events associated, for one item i can have multiple time the same events.
What i would like to achieve is to get the average count of event in the whole population of items.
e.g a table that would look like
Event ID AverageAmountofTx
100 1
200 1.5
300 1
What would be the most efficient way to carry out that testing? Is there a direct way to do this using proc means?
Probably an easy way to do that but it's friday ,-) My table is huge so i would like to avoid creating an intermediary table with:
ItemIT EVENT ID Count
ABC 100 1
ABC 200 2
ABC 300 1
etc
How about:
data have; input Item_ID $ Event_ID; datalines; ABC 100 ABC 200 ABC 200 ABC 300 CBA 100 CBA 200 CBA 300 ; run; proc sql ; create table want as select event_id,count(event_id)/count(distinct item_id) as avg from have group by event_id; quit;
Ksharp
PROC SUMMARY DATA = HAVE NWAY MISSING;
CLASS EVENTID ITEMID;
OUTPUT OUT = HAVE1;
RUN;
_FREQ_ WILL BE YOUR COUNT AND YOU SHOULD ALSO HAVE OTHER STATISTICS DISPLAYED
data have;
input Item_ID $ Event_ID;
datalines;
ABC 100
ABC 200
ABC 200
ABC 300
CBA 100
CBA 200
CBA 300
;
run;
proc sort data = have;
by Item_ID;
run;
proc freq data = have;
tables Item_ID*Event_ID / list missing norow nocol nocum nopercent;
run;
Output:
Item_ID Event_ID Frequency
Thanks guys for the quick answers, my question was more to know if you can avoid creating that table:
Item_ID Event_ID Frequency
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ABC 100 1
ABC 200 2
ABC 300 1
CBA 100 1
CBA 200 1
CBA 300 1
I would like to create the following table directly without using an intermediary table (my source table is huge)
Event_ID AverageCount
100 1
200 1.5
300 1
didnt you get those required statiistics using my code for Proc summary??
data have;
input Item_ID $ Event_ID;
datalines;
ABC 100
ABC 200
ABC 200
ABC 300
CBA 100
CBA 200
CBA 300
;
run;
proc sql;
select Event_id,avg(cnt) as aver from
(select count(item_id) as cnt, event_id,item_id from have group by event_id, Item_id)
group by event_id;
quit;
Tested n working.
How about:
data have; input Item_ID $ Event_ID; datalines; ABC 100 ABC 200 ABC 200 ABC 300 CBA 100 CBA 200 CBA 300 ; run; proc sql ; create table want as select event_id,count(event_id)/count(distinct item_id) as avg from have group by event_id; quit;
Ksharp
Thanks guys, will give it a try on Monday, the table will have over 1 billion records pretty soon, so will see how each query perform.
I was also thinking maybe i should just do the average on a given period of time and annualize it so that creating an intermediary table would be ok...
Will still give a try to all queries above will keep you guys updated.
It depends if you want the average number of each event type in items in which they appear (which Ksharps' query above will calculate) or the average over all items. Compare :
data have;
input Item_ID $ Event_ID;
datalines;
ABC 100
ABC 200
ABC 200
ABC 300
CBA 100
CBA 200
;
proc sql ;
create table want1 as
select event_id,count(event_id)/count(distinct item_id) as avg
from have
group by event_id;
create table want2 as
select event_id,count(event_id)/(select count(distinct item_id) from have) as avg
from have
group by event_id;
select want1.*, want2.avg as avg2
from want1, want2 where want1.event_ID=want2.event_ID;
quit;
PG
Awesome guys, great to have so many different answers!
Thanks for the additional information PG, i am using Ksharp query but yours just gave me more depth for my analysis
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.