Solved: Re: sum of sales within 5 minutes!

vxhong17 · Posted 06-03-2013 05:17 PM

Hi All,

Two products A and B are sold in the US and Canada. I want to sum of sales with the same price within 5 minutes for each market whenever there are sales with the same price in each market. I assign the time of the first sale as the time for output. For example, product A. the price is 5 (6) from 10:05 to 10:09 in US (Canada) market. Thus, I will sum of all sales in this period for each market. Sale for the US is 7 + 7 + 6 + 4 +7 (31) and Sale for Canada is 8 + 5 (13). The first sale in the US (Canada) is a10:05 (10:06). Thus, I have the first two rows in the output. I do similar whenever there are sales within 5 minutes in each market.

Could you show me how to deal with this matter?

Thank you so much,

Input:

time	Product	sale	Price	Market
10:05	A	7	5	US
10:06	A	8	6	Canada
10:06	A	7	5	US
10:07	A	6	5	US
10:08	A	5	6	Canada
10:08	A	4	5	US
10:09	A	7	5	US
10:12	A	12	5	US
10:12	A	8	5	Canada
10:13	A	10	5	Canada
10:13	A	9	5	US
10:13	A	10	6	Canada
10:14	A	11	6	US
10:15	A	12	6	US
10:16	A	13	6	US
10:01	B	20	10	US
10:02	B	15	10	Canada
10:02	B	22	10	US
10:03	B	24	10	US
10:03	B	27	10	Canada
10:04	B	28	10	US
10:05	B	22	10	US
10:07	B	21	10	US
10:08	B	19	10	Canada
10:08	B	20	10	US
10:09	B	23	11	US
10:10	B	24	11	Canada
10:12	B	5	11	Canada

Output

time	Product	sale	price	market
10:05	A	31	5	US	=7+7+6=4+7
10:06	A	13	6	Canada	=8+5
10:12	A	21	5	US	=12+9
10:12	A	18	5	Canada	=8+10
10:13	A	10	6	Canada
10:14	A	36	6	US	=11+12+13
10:01	B	116	10	US
10:02	B	42	10	Canada
10:07	B	41	10	US
10:08	B	19	10	Canada
10:09	B	23	11	US
10:10	B	29	11	Canada

Haikuo · Posted 06-04-2013 10:15 PM

A bit late for the party, was actually working hard.The moment I saw the title, the idea of using Hash() starts to tickle me none stop. Here is my approach, hopefully more fitting into the original request.

data have;

input time time5. Product $ sale Price Market$;

format time time5.;

cards;

10:05 A 7 5 US

10:06 A 8 6 Canada

10:06 A 7 5 US

10:07 A 6 5 US

10:08 A 5 6 Canada

10:08 A 4 5 US

10:09 A 7 5 US

10:12 A 12 5 US

10:12 A 8 5 Canada

10:13 A 10 5 Canada

10:13 A 9 5 US

10:13 A 10 6 Canada

10:14 A 11 6 US

10:15 A 12 6 US

10:16 A 13 6 US

10:01 B 20 10 US

10:02 B 15 10 Canada

10:02 B 22 10 US

10:03 B 24 10 US

10:03 B 27 10 Canada

10:04 B 28 10 US

10:05 B 22 10 US

10:07 B 21 10 US

10:08 B 19 10 Canada

10:08 B 20 10 US

10:09 B 23 11 US

10:10 B 24 11 Canada

10:12 B 5 11 Canada

;

data want;

if _n_=1 then do;

if 0 then set have(rename=(time=_t sale=_s));

declare hash h(dataset:'have(rename=(time=_t sale=_s))', multidata:'y', ordered:'y');

h.definekey('product', 'price', 'market');
h.definedata(all:'y');
h.definedone();

call missing(_t,_s);

declare hash h1(dataset:'have', multidata:'y', ordered:'y');

h1.definekey('product', 'price', 'market','time','sale');

h1.definedata(all:'y');

h1.definedone();

end;

set have;

if h1.find() = 0 ;

sale_sum=0;

do rc=h.find() by 0 while (rc=0);

if 0 = <_t - time <=300 then do; sale_sum + _s; rc=h1.remove(key:product, key:price, key:market,key:_t,key:_s);end;

rc=h.find_next();

end;

drop rc _:;

run;

Haikuo

View solution in original post

PGStats · Posted 06-03-2013 09:14 PM

This is not exactly what you asked for. I think you would get a fairer comparison if you used the same time periods for both markets. Here is how to get total sales and mean price for each 5 minute period when sales occured on both markets :

data have;
input time :time5. Product $ sale Price Market$;
format time time5.;
datalines;
10:05 A 7 5 US
10:06 A 8 6 Canada
10:06 A 7 5 US
10:07 A 6 5 US
10:08 A 5 6 Canada
10:08 A 4 5 US
10:09 A 7 5 US
10:12 A 12 5 US
10:12 A 8 5 Canada
10:13 A 10 5 Canada
10:13 A 9 5 US
10:13 A 10 6 Canada
10:14 A 11 6 US
10:15 A 12 6 US
10:16 A 13 6 US
10:01 B 20 10 US
10:02 B 15 10 Canada
10:02 B 22 10 US
10:03 B 24 10 US
10:03 B 27 10 Canada
10:04 B 28 10 US
10:05 B 22 10 US
10:07 B 21 10 US
10:08 B 19 10 Canada
10:08 B 20 10 US
10:09 B 23 11 US
10:10 B 24 11 Canada
10:12 B 5 11 Canada
;

proc sql;
create table CanOrUS5min as
select
     product,
     intnx("MINUTE5",time,0) as time5 format=time5.,
     market,
     sum(sale) as totalSale,
     mean(price) as meanPrice
from have
group by product, calculated time5, market;

create table CanAndUS5min as
select
     product,
     time5,
     market,
     totalSale,
     meanPrice
from CanOrUS5min
group by product, time5
having count(market)=2
order by product, time5, market;

drop table CanOrUS5min;

select * from CanAndUS5min;
quit;

hth

PG

CharlotteCain · Posted 06-04-2013 02:29 AM

Hi PG sir,

Subject: Out of context to this discussion /*forgive me*/

Your solutions at the supersonic speed at which you seem to write codes really amazes me. It also inspires me to become as great as you. I have noticed on these forums you have been applauded many many times along the likes of Art, haikuo, tomkari et al so yet another praise from is nothing perhaps new to you.

However for me, I am not from a conventional academic background nor super smart like you. By random chance, I got exposed to SAS and i am slowly learning and progressing. Would you suggest me how to pick up speed, particularly like familiarizing with those tons of functions and call routines like when and where to use/apply them.

Right now, I am extensively researching and learning on sugi and support.sas. and few other books. However it doesn't seem adequate. Can you guide me a few tips in speedy learning. I may not be smart nor academic, but i dont wanna give up.

Your response would be most appreciated. Thanks

Charlotte from England

PGStats · Posted 06-04-2013 11:01 AM

Thank you Charlotte for the good words.

The good news is that you already got as far as you did. I met many who tried to learn SAS by themselves and just don't get it; they never progress. You are among the special few with the motivation and talent to succeed.

The bad news is that learning SAS by yourself is a long process. I have been at it, on and off, since 1989! So you understand speedy learning is not my specialty. I never took a course; I progressed, albeit slowly, by solving real problems (mine or other's) in the course of my work. I learned a lot lately from the discussion forum, it made me try many new things like proc fcmp, hash objects and DOW loops. For me, it is a form of continuing education.

I hereby call on all SAS teachers (, , , and others?) for help on self learning material.

Best of luck.

PG de Montréal

PG

CharlotteCain · Posted 06-04-2013 11:34 AM

Merci beaucoup PG Sir,

Noticed Montreal Je fais l'amour à Montréal et au Canada. Although English girl through and through, love the french too. I don't mind exchanging Traditional English Fish & Chips for a nice French wine. I guess the same duplicates applies in Quebec and Anglophone Canada.

Passez une bonne journée,

Charlotte de Liverpool, England

Haikuo · Posted 06-04-2013 10:15 PM

A bit late for the party, was actually working hard.The moment I saw the title, the idea of using Hash() starts to tickle me none stop. Here is my approach, hopefully more fitting into the original request.

data have;

input time time5. Product $ sale Price Market$;

format time time5.;

cards;

10:05 A 7 5 US

10:06 A 8 6 Canada

10:06 A 7 5 US

10:07 A 6 5 US

10:08 A 5 6 Canada

10:08 A 4 5 US

10:09 A 7 5 US

10:12 A 12 5 US

10:12 A 8 5 Canada

10:13 A 10 5 Canada

10:13 A 9 5 US

10:13 A 10 6 Canada

10:14 A 11 6 US

10:15 A 12 6 US

10:16 A 13 6 US

10:01 B 20 10 US

10:02 B 15 10 Canada

10:02 B 22 10 US

10:03 B 24 10 US

10:03 B 27 10 Canada

10:04 B 28 10 US

10:05 B 22 10 US

10:07 B 21 10 US

10:08 B 19 10 Canada

10:08 B 20 10 US

10:09 B 23 11 US

10:10 B 24 11 Canada

10:12 B 5 11 Canada

;

data want;

if _n_=1 then do;

if 0 then set have(rename=(time=_t sale=_s));

declare hash h(dataset:'have(rename=(time=_t sale=_s))', multidata:'y', ordered:'y');

h.definekey('product', 'price', 'market');
h.definedata(all:'y');
h.definedone();

call missing(_t,_s);

declare hash h1(dataset:'have', multidata:'y', ordered:'y');

h1.definekey('product', 'price', 'market','time','sale');

h1.definedata(all:'y');

h1.definedone();

end;

set have;

if h1.find() = 0 ;

sale_sum=0;

do rc=h.find() by 0 while (rc=0);

if 0 = <_t - time <=300 then do; sale_sum + _s; rc=h1.remove(key:product, key:price, key:market,key:_t,key:_s);end;

rc=h.find_next();

end;

drop rc _:;

run;

Haikuo

vxhong17 · Posted 06-04-2013 10:48 PM

Hi PGStats,

Thank you so much for your suggestion,

Hi Haikuo,

I was still waiting for the help. Thank you so much for your precious help,

vxhong17 · Posted 06-05-2013 03:45 PM

Hi Haikuo,

Thank you so much for your sample code. It worked perfectly for a small sample. However, when I worked with a very large sample, SAS reported insufficient memory.

Could you have any suggestion?

Thank you so much,

Haikuo · Posted 06-05-2013 05:10 PM

Ouch! I hear people say, Hash does not have limit, but RAM is the limit. So besides reaching for your pocket book and buying more RAMs, how does your real data look like, does it have only 5 variables or more? How many obs you have? How much RAM you have? Do you have a unique index variable to identify obs?

Haikuo

vxhong17 · Posted 06-10-2013 03:07 AM

Hi Haikuo,

My RAM is 8GB. I have 7 variables. I work on many files. Each is from 2GB to 4GB. I do not have a unique ID and have to use several ones.

Thank you so much,

AhmedAl_Attar · Posted 06-10-2013 08:00 AM

Hi vxhong17,

I would recommend doing the following:

1. Find out how much memory is allocate to your SAS session ----> proc options group=memory; run;

2. Specify higher internal hash table size, by specifying higher hashexp: value. Check the online help below

hashexp: n

The hash object's internal table size, where the size of the hash table is 2ⁿ.

The value of HASHEXP is used as a power-of-two exponent to create the hash table size. For example, a value of 4 for HASHEXP equates to a hash table size of 2⁴, or 16. The maximum value for HASHEXP is 20.

The hash table size is not equal to the number of items that can be stored. Imagine the hash table as an array of 'buckets.' A hash table size of 16 would have 16 'buckets.' Each bucket can hold an infinite number of items. The efficiency of the hash table lies in the ability of the hashing function to map items to and retrieve items from the buckets.

You should specify the hash table size relative to the amount of data in the hash object in order to maximize the efficiency of the hash object lookup routines. Try different HASHEXP values until you get the best result. For example, if the hash object contains one million items, a hash table size of 16 (HASHEXP = 4) would work, but not very efficiently. A hash table size of 512 or 1024 (HASHEXP = 9 or 10) would result in the best performance.

Default:

8, which equates to a hash table size of 2⁸ or 256

Haikuo · Posted 06-10-2013 03:57 PM

OP's issue is the tables are too large to fit into the RAM, and Hash objects will chew up whatever available in the RAM, Hashexp setting has NO effect on that. After loading data into RAM successfully done, then Hashexp will determine how the mapping works, therefore affects the efficiency.

Haikuo

vxhong17 · Posted 06-11-2013 05:04 PM

Hi AhmedAl-Attar,

Thank you for your suggestion.

Hi Haikuo,

I just do one by one.

It works better now. Thank you so much.

Haikuo · Posted 06-10-2013 03:32 PM

One question and one suggestion:

Question: What you do you mean by "work on many files"? Are you doing do it in parallel (multiple SAS session at the same time) or you just do it one file at a time? I hope it is the latter, otherwise, you need to opt to the latter.

Suggestion:

You probably have to do it more than one pass, in the first pass, an unique identifier is created:

data have1;

set have;

_index=_n_;

run;

After you have done that, here is the modified code, it will save roughly half of the RAM needed than previous version, try it out and let us know how it works.

data have;

input time time5. Product $ sale Price Market$;

format time time5.;

cards;

10:05 A 7 5 US

10:06 A 8 6 Canada

10:06 A 7 5 US

10:07 A 6 5 US

10:08 A 5 6 Canada

10:08 A 4 5 US

10:09 A 7 5 US

10:12 A 12 5 US

10:12 A 8 5 Canada

10:13 A 10 5 Canada

10:13 A 9 5 US

10:13 A 10 6 Canada

10:14 A 11 6 US

10:15 A 12 6 US

10:16 A 13 6 US

10:01 B 20 10 US

10:02 B 15 10 Canada

10:02 B 22 10 US

10:03 B 24 10 US

10:03 B 27 10 Canada

10:04 B 28 10 US

10:05 B 22 10 US

10:07 B 21 10 US

10:08 B 19 10 Canada

10:08 B 20 10 US

10:09 B 23 11 US

10:10 B 24 11 Canada

10:12 B 5 11 Canada

;

data have1;

set have;

_index=_n_;

run;

data want;

if _n_=1 then do;

declare hash h(dataset:'have1(keep=_index time Product sale Price Market rename=(time=_t sale=_s))', multidata:'y', ordered:'y');

h.definekey('product', 'price', 'market');
h.definedata(all:'y');
h.definedone();

call missing(_t,_s);

declare hash h1(dataset:'have1(keep=_index)');

h1.definekey('_index');

h1.definedata(data:'_index');

h1.definedone();

end;

set have1;

if h1.find() = 0 ;

sale_sum=0;

do rc=h.find() by 0 while (rc=0);

if 0 = <_t - time <=300 then do; sale_sum + _s; rc=h1.remove();end;

rc=h.find_next();

end;

drop rc _:;

run;

Good Luck,

Haikuo

Registration is open

SAS Training: Just a Click Away