BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pontch
Fluorite | Level 6

Hi guys,

I got a table with a lot of events.

table sample would look like

ItemID Event ID

ABC 100

ABC 200

ABC 200

ABC 300

CBA 100

CBA 200

CBA 300

For the same Item i can have a lot of  different events associated, for one item i can have multiple time the same events.

What i would like to achieve is to get the average count of event in the whole population of items.

e.g a table that would look like

Event ID AverageAmountofTx

100     1

200    1.5

300    1

What would be the most efficient way to carry out that testing? Is there a direct way to do this using proc means?

Probably an easy way to do that but it's friday ,-) My table is huge so i would like to avoid creating an intermediary table with:

ItemIT EVENT ID Count

ABC 100 1

ABC 200 2

ABC 300 1

etc

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

How about:

data have;
input Item_ID $ Event_ID;
datalines;
ABC 100
ABC 200
ABC 200
ABC 300
CBA 100
CBA 200
CBA 300
;
run;
proc sql ;
create table want as
 select event_id,count(event_id)/count(distinct item_id) as avg
  from have
   group by event_id;
quit;

Ksharp

View solution in original post

9 REPLIES 9
akberali67
Calcite | Level 5

PROC SUMMARY DATA = HAVE NWAY MISSING;

CLASS EVENTID ITEMID;

OUTPUT OUT = HAVE1;

RUN;

_FREQ_ WILL BE YOUR COUNT AND YOU SHOULD ALSO HAVE OTHER STATISTICS DISPLAYED

Hima
Obsidian | Level 7

data have;
input Item_ID $ Event_ID;
datalines;
ABC 100
ABC 200
ABC 200
ABC 300
CBA 100
CBA 200
CBA 300
;
run;

proc sort data = have;
by Item_ID;
run;


proc freq data = have;
tables Item_ID*Event_ID / list missing norow nocol nocum nopercent;
run;

Output:

         

                               Item_ID    Event_ID    Frequency
                                ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
                                ABC             100           1
                                ABC             200           2
                                ABC             300           1
                                CBA             100           1
                                CBA             200           1
                                CBA             300           1


Pontch
Fluorite | Level 6

Thanks guys for the quick answers,  my question was more to know if you can avoid creating that table:

Item_ID    Event_ID    Frequency

                                ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                                ABC             100           1

                                ABC             200           2

                                ABC             300           1

                                CBA             100           1

                                CBA             200           1

                                CBA             300           1

I would like to create the following table directly without using an intermediary table (my source table is huge)

Event_ID AverageCount

100                1

200              1.5

300                1

akberali67
Calcite | Level 5

didnt you get those required statiistics using my code for Proc summary??

Hima
Obsidian | Level 7

data have;

input Item_ID $ Event_ID;

datalines;

ABC 100

ABC 200

ABC 200

ABC 300

CBA 100

CBA 200

CBA 300

;

run;

proc sql;

select Event_id,avg(cnt) as aver from

(select count(item_id) as cnt, event_id,item_id from have group by event_id, Item_id)

group by event_id;

quit;

Tested n working.

Ksharp
Super User

How about:

data have;
input Item_ID $ Event_ID;
datalines;
ABC 100
ABC 200
ABC 200
ABC 300
CBA 100
CBA 200
CBA 300
;
run;
proc sql ;
create table want as
 select event_id,count(event_id)/count(distinct item_id) as avg
  from have
   group by event_id;
quit;

Ksharp

Pontch
Fluorite | Level 6

Thanks guys, will give it a try on Monday, the table will have over 1 billion records pretty soon, so will see how each query perform.

I was also thinking maybe i should just do the average on a given period of time and annualize it so that creating an intermediary table would be ok...

Will still give a try to all queries above will keep you guys updated.

PGStats
Opal | Level 21

It depends if you want the average number of each event type in items in which they appear (which Ksharps' query above will calculate) or the average over all items. Compare :

data have;

input Item_ID $ Event_ID;

datalines;

ABC 100

ABC 200

ABC 200

ABC 300

CBA 100

CBA 200

;

proc sql ;

create table want1 as

select event_id,count(event_id)/count(distinct item_id) as avg

  from have

   group by event_id;

create table want2 as

select event_id,count(event_id)/(select count(distinct item_id) from have) as avg

  from have

   group by event_id;

select want1.*, want2.avg as avg2

from want1, want2 where want1.event_ID=want2.event_ID;

quit;

 

PG

PG
Pontch
Fluorite | Level 6

Awesome guys, great to have so many different answers!

Thanks for the additional information PG,  i am using Ksharp query but yours just gave me more depth for my analysis

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1255 views
  • 7 likes
  • 5 in conversation