BookmarkSubscribeRSS Feed
Raholco
Calcite | Level 5

The SAS code I am using is the following:

 

PROC SUMMARY DATA=INTPS missing;

CLASS HH;

VAR HH;

OUTPUT OUT=TPS

MEAN=AVGTPS

MAX=PEAKTPS

/NOINHERIT;

 

 

9 REPLIES 9
Kurt_Bremser
Super User

Please supply an example for your data in usable form (data step with datalines, posted into a code box opened with the "little running man").

Since you only use one variable (which is suspicious), we need to know what is contained in HH and its attributes.

Raholco
Calcite | Level 5

HH is hours - the input dataset contains 1 entry per transaction processed - MM is minutes, SS is seconds. This input dataset contains over 100 million records. What I want to calculate is transaction arrival rate by hour.

Raholco
Calcite | Level 5
In addition to arrival rate by hour, the report must include average transactions per second and peak transaction per second.
Kurt_Bremser
Super User

Define "arrival rate".

Transactions per second can be calculated like this:

data intps;
input hh mm ss;
datalines;
3 59 59.807316959
3 59 59.813843966
3 59 59.823140979
4 0 0.123
4 0 0.345
;

data transactions;
set intps end = done;
retain hour min sec count;
sec = floor(ss);
if hour ne hh or min ne mm or sec ne floor(ss)
then do;
  if count > 0 then output;
  count = 0;
  hour = hh;
  min = mm;
  sec = floor(ss);
end;
count + 1;
if done then output;
keep hour min sec count;
run;

proc summary data=transactions;
by hour;
var count;
output out=want (drop=_type_ _freq_) mean()=avgtps max()=peaktps;
run;

Note that I use the rather complicated data step to avoid using SQL or an extra step that calculates the second without the fractions. You could replace it by

proc sql;
create table transactions as
  select
    hh,
    mm,
    floor(ss) as ss,
    count(*) as count
  from intps
  group by hh, mm, calculated ss
;
quit;

and then use hh as BY variable in the SUMMARY. Which method you use should be determined by test runs with your production data.

You may also contemplate some method to insert "0" observations for missing seconds, so that you always have 3600 observations per hour.

Raholco
Calcite | Level 5

I ran the code, and AVGTPS and PEAKTPS don't look right - They show the same values that I got when I ran my own version:

 

Hour               Trancount          Average TPS         Peak TPS

3                            188                    3                          3

4                    2191715                    4                          4

5                    2433846                    5                          5

....

21                  3741828                  21                        21

 

Doing the math would show that for hour 21, the average TPS would be 1039

(3,741,828 / 3600 (60 sec * 60 minutes)).

Raholco
Calcite | Level 5

Sample data:

3 59 59.807316959

3 59 59.813843966

3 59 59.823140979

...

23 59 59.904180944

 

My sample dataset 9865385 records - a full dataset will contain over 100 million records.

Raholco
Calcite | Level 5

3 59 59.807316959
3 59 59.813843966
3 59 59.823140979
4 0 0.123
4 0 0.345

The first column is the hour, the second column is the minute, the last column is the second with the subseconds following the period. The dataset has over 100 million observations.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 797 views
  • 0 likes
  • 2 in conversation