Hi All,
Two products A and B are sold in the US and Canada. I want to sum of sales with the same price within 5 minutes for each market whenever there are sales with the same price in each market. I assign the time of the first sale as the time for output. For example, product A. the price is 5 (6) from 10:05 to 10:09 in US (Canada) market. Thus, I will sum of all sales in this period for each market. Sale for the US is 7 + 7 + 6 + 4 +7 (31) and Sale for Canada is 8 + 5 (13). The first sale in the US (Canada) is a10:05 (10:06). Thus, I have the first two rows in the output. I do similar whenever there are sales within 5 minutes in each market.
Could you show me how to deal with this matter?
Thank you so much,
Input:
time | Product | sale | Price | Market |
10:05 | A | 7 | 5 | US |
10:06 | A | 8 | 6 | Canada |
10:06 | A | 7 | 5 | US |
10:07 | A | 6 | 5 | US |
10:08 | A | 5 | 6 | Canada |
10:08 | A | 4 | 5 | US |
10:09 | A | 7 | 5 | US |
10:12 | A | 12 | 5 | US |
10:12 | A | 8 | 5 | Canada |
10:13 | A | 10 | 5 | Canada |
10:13 | A | 9 | 5 | US |
10:13 | A | 10 | 6 | Canada |
10:14 | A | 11 | 6 | US |
10:15 | A | 12 | 6 | US |
10:16 | A | 13 | 6 | US |
10:01 | B | 20 | 10 | US |
10:02 | B | 15 | 10 | Canada |
10:02 | B | 22 | 10 | US |
10:03 | B | 24 | 10 | US |
10:03 | B | 27 | 10 | Canada |
10:04 | B | 28 | 10 | US |
10:05 | B | 22 | 10 | US |
10:07 | B | 21 | 10 | US |
10:08 | B | 19 | 10 | Canada |
10:08 | B | 20 | 10 | US |
10:09 | B | 23 | 11 | US |
10:10 | B | 24 | 11 | Canada |
10:12 | B | 5 | 11 | Canada |
Output
time | Product | sale | price | market | |
10:05 | A | 31 | 5 | US | =7+7+6=4+7 |
10:06 | A | 13 | 6 | Canada | =8+5 |
10:12 | A | 21 | 5 | US | =12+9 |
10:12 | A | 18 | 5 | Canada | =8+10 |
10:13 | A | 10 | 6 | Canada | |
10:14 | A | 36 | 6 | US | =11+12+13 |
10:01 | B | 116 | 10 | US | |
10:02 | B | 42 | 10 | Canada | |
10:07 | B | 41 | 10 | US | |
10:08 | B | 19 | 10 | Canada | |
10:09 | B | 23 | 11 | US | |
10:10 | B | 29 | 11 | Canada |
A bit late for the party, was actually working hard.The moment I saw the title, the idea of using Hash() starts to tickle me none stop. Here is my approach, hopefully more fitting into the original request.
data have;
input time time5. Product $ sale Price Market$;
format time time5.;
cards;
10:05 A 7 5 US
10:06 A 8 6 Canada
10:06 A 7 5 US
10:07 A 6 5 US
10:08 A 5 6 Canada
10:08 A 4 5 US
10:09 A 7 5 US
10:12 A 12 5 US
10:12 A 8 5 Canada
10:13 A 10 5 Canada
10:13 A 9 5 US
10:13 A 10 6 Canada
10:14 A 11 6 US
10:15 A 12 6 US
10:16 A 13 6 US
10:01 B 20 10 US
10:02 B 15 10 Canada
10:02 B 22 10 US
10:03 B 24 10 US
10:03 B 27 10 Canada
10:04 B 28 10 US
10:05 B 22 10 US
10:07 B 21 10 US
10:08 B 19 10 Canada
10:08 B 20 10 US
10:09 B 23 11 US
10:10 B 24 11 Canada
10:12 B 5 11 Canada
;
data want;
if _n_=1 then do;
if 0 then set have(rename=(time=_t sale=_s));
declare hash h(dataset:'have(rename=(time=_t sale=_s))', multidata:'y', ordered:'y');
call missing(_t,_s);
declare hash h1(dataset:'have', multidata:'y', ordered:'y');
h1.definekey('product', 'price', 'market','time','sale');
h1.definedata(all:'y');
h1.definedone();
end;
set have;
if h1.find() = 0 ;
sale_sum=0;
do rc=h.find() by 0 while (rc=0);
if 0 = <_t - time <=300 then do; sale_sum + _s; rc=h1.remove(key:product, key:price, key:market,key:_t,key:_s);end;
rc=h.find_next();
end;
drop rc _:;
run;
Haikuo
This is not exactly what you asked for. I think you would get a fairer comparison if you used the same time periods for both markets. Here is how to get total sales and mean price for each 5 minute period when sales occured on both markets :
data have;
input time :time5. Product $ sale Price Market$;
format time time5.;
datalines;
10:05 A 7 5 US
10:06 A 8 6 Canada
10:06 A 7 5 US
10:07 A 6 5 US
10:08 A 5 6 Canada
10:08 A 4 5 US
10:09 A 7 5 US
10:12 A 12 5 US
10:12 A 8 5 Canada
10:13 A 10 5 Canada
10:13 A 9 5 US
10:13 A 10 6 Canada
10:14 A 11 6 US
10:15 A 12 6 US
10:16 A 13 6 US
10:01 B 20 10 US
10:02 B 15 10 Canada
10:02 B 22 10 US
10:03 B 24 10 US
10:03 B 27 10 Canada
10:04 B 28 10 US
10:05 B 22 10 US
10:07 B 21 10 US
10:08 B 19 10 Canada
10:08 B 20 10 US
10:09 B 23 11 US
10:10 B 24 11 Canada
10:12 B 5 11 Canada
;
proc sql;
create table CanOrUS5min as
select
product,
intnx("MINUTE5",time,0) as time5 format=time5.,
market,
sum(sale) as totalSale,
mean(price) as meanPrice
from have
group by product, calculated time5, market;
create table CanAndUS5min as
select
product,
time5,
market,
totalSale,
meanPrice
from CanOrUS5min
group by product, time5
having count(market)=2
order by product, time5, market;
drop table CanOrUS5min;
select * from CanAndUS5min;
quit;
hth
PG
Hi PG sir,
Subject: Out of context to this discussion /*forgive me*/
Your solutions at the supersonic speed at which you seem to write codes really amazes me. It also inspires me to become as great as you. I have noticed on these forums you have been applauded many many times along the likes of Art, haikuo, tomkari et al so yet another praise from is nothing perhaps new to you.
However for me, I am not from a conventional academic background nor super smart like you. By random chance, I got exposed to SAS and i am slowly learning and progressing. Would you suggest me how to pick up speed, particularly like familiarizing with those tons of functions and call routines like when and where to use/apply them.
Right now, I am extensively researching and learning on sugi and support.sas. and few other books. However it doesn't seem adequate. Can you guide me a few tips in speedy learning. I may not be smart nor academic, but i dont wanna give up.
Your response would be most appreciated. Thanks
Charlotte from England
Thank you Charlotte for the good words.
The good news is that you already got as far as you did. I met many who tried to learn SAS by themselves and just don't get it; they never progress. You are among the special few with the motivation and talent to succeed.
The bad news is that learning SAS by yourself is a long process. I have been at it, on and off, since 1989! So you understand speedy learning is not my specialty. I never took a course; I progressed, albeit slowly, by solving real problems (mine or other's) in the course of my work. I learned a lot lately from the discussion forum, it made me try many new things like proc fcmp, hash objects and DOW loops. For me, it is a form of continuing education.
I hereby call on all SAS teachers (, , , and others?) for help on self learning material.
Best of luck.
PG de Montréal
Merci beaucoup PG Sir,
Noticed Montreal Je fais l'amour à Montréal et au Canada. Although English girl through and through, love the french too. I don't mind exchanging Traditional English Fish & Chips for a nice French wine. I guess the same duplicates applies in Quebec and Anglophone Canada.
Passez une bonne journée,
Charlotte de Liverpool, England
A bit late for the party, was actually working hard.The moment I saw the title, the idea of using Hash() starts to tickle me none stop. Here is my approach, hopefully more fitting into the original request.
data have;
input time time5. Product $ sale Price Market$;
format time time5.;
cards;
10:05 A 7 5 US
10:06 A 8 6 Canada
10:06 A 7 5 US
10:07 A 6 5 US
10:08 A 5 6 Canada
10:08 A 4 5 US
10:09 A 7 5 US
10:12 A 12 5 US
10:12 A 8 5 Canada
10:13 A 10 5 Canada
10:13 A 9 5 US
10:13 A 10 6 Canada
10:14 A 11 6 US
10:15 A 12 6 US
10:16 A 13 6 US
10:01 B 20 10 US
10:02 B 15 10 Canada
10:02 B 22 10 US
10:03 B 24 10 US
10:03 B 27 10 Canada
10:04 B 28 10 US
10:05 B 22 10 US
10:07 B 21 10 US
10:08 B 19 10 Canada
10:08 B 20 10 US
10:09 B 23 11 US
10:10 B 24 11 Canada
10:12 B 5 11 Canada
;
data want;
if _n_=1 then do;
if 0 then set have(rename=(time=_t sale=_s));
declare hash h(dataset:'have(rename=(time=_t sale=_s))', multidata:'y', ordered:'y');
call missing(_t,_s);
declare hash h1(dataset:'have', multidata:'y', ordered:'y');
h1.definekey('product', 'price', 'market','time','sale');
h1.definedata(all:'y');
h1.definedone();
end;
set have;
if h1.find() = 0 ;
sale_sum=0;
do rc=h.find() by 0 while (rc=0);
if 0 = <_t - time <=300 then do; sale_sum + _s; rc=h1.remove(key:product, key:price, key:market,key:_t,key:_s);end;
rc=h.find_next();
end;
drop rc _:;
run;
Haikuo
Hi PGStats,
Thank you so much for your suggestion,
Hi Haikuo,
I was still waiting for the help. Thank you so much for your precious help,
Hi Haikuo,
Thank you so much for your sample code. It worked perfectly for a small sample. However, when I worked with a very large sample, SAS reported insufficient memory.
Could you have any suggestion?
Thank you so much,
Ouch! I hear people say, Hash does not have limit, but RAM is the limit. So besides reaching for your pocket book and buying more RAMs, how does your real data look like, does it have only 5 variables or more? How many obs you have? How much RAM you have? Do you have a unique index variable to identify obs?
Haikuo
Hi Haikuo,
My RAM is 8GB. I have 7 variables. I work on many files. Each is from 2GB to 4GB. I do not have a unique ID and have to use several ones.
Thank you so much,
Hi vxhong17,
I would recommend doing the following:
1. Find out how much memory is allocate to your SAS session ----> proc options group=memory; run;
2. Specify higher internal hash table size, by specifying higher hashexp: value. Check the online help below
The hash object's internal table size, where the size of the hash table is 2n.
The value of HASHEXP is used as a power-of-two exponent to create the hash table size. For example, a value of 4 for HASHEXP equates to a hash table size of 24, or 16. The maximum value for HASHEXP is 20.
The hash table size is not equal to the number of items that can be stored. Imagine the hash table as an array of 'buckets.' A hash table size of 16 would have 16 'buckets.' Each bucket can hold an infinite number of items. The efficiency of the hash table lies in the ability of the hashing function to map items to and retrieve items from the buckets.
You should specify the hash table size relative to the amount of data in the hash object in order to maximize the efficiency of the hash object lookup routines. Try different HASHEXP values until you get the best result. For example, if the hash object contains one million items, a hash table size of 16 (HASHEXP = 4) would work, but not very efficiently. A hash table size of 512 or 1024 (HASHEXP = 9 or 10) would result in the best performance.
Default: | 8, which equates to a hash table size of 28 or 256 |
OP's issue is the tables are too large to fit into the RAM, and Hash objects will chew up whatever available in the RAM, Hashexp setting has NO effect on that. After loading data into RAM successfully done, then Hashexp will determine how the mapping works, therefore affects the efficiency.
Haikuo
Hi AhmedAl-Attar,
Thank you for your suggestion.
Hi Haikuo,
I just do one by one.
It works better now. Thank you so much.
One question and one suggestion:
Question: What you do you mean by "work on many files"? Are you doing do it in parallel (multiple SAS session at the same time) or you just do it one file at a time? I hope it is the latter, otherwise, you need to opt to the latter.
Suggestion:
You probably have to do it more than one pass, in the first pass, an unique identifier is created:
data have1;
set have;
_index=_n_;
run;
After you have done that, here is the modified code, it will save roughly half of the RAM needed than previous version, try it out and let us know how it works.
data have;
input time time5. Product $ sale Price Market$;
format time time5.;
cards;
10:05 A 7 5 US
10:06 A 8 6 Canada
10:06 A 7 5 US
10:07 A 6 5 US
10:08 A 5 6 Canada
10:08 A 4 5 US
10:09 A 7 5 US
10:12 A 12 5 US
10:12 A 8 5 Canada
10:13 A 10 5 Canada
10:13 A 9 5 US
10:13 A 10 6 Canada
10:14 A 11 6 US
10:15 A 12 6 US
10:16 A 13 6 US
10:01 B 20 10 US
10:02 B 15 10 Canada
10:02 B 22 10 US
10:03 B 24 10 US
10:03 B 27 10 Canada
10:04 B 28 10 US
10:05 B 22 10 US
10:07 B 21 10 US
10:08 B 19 10 Canada
10:08 B 20 10 US
10:09 B 23 11 US
10:10 B 24 11 Canada
10:12 B 5 11 Canada
;
data have1;
set have;
_index=_n_;
run;
data want;
if _n_=1 then do;
declare hash h(dataset:'have1(keep=_index time Product sale Price Market rename=(time=_t sale=_s))', multidata:'y', ordered:'y');
call missing(_t,_s);
declare hash h1(dataset:'have1(keep=_index)');
h1.definekey('_index');
h1.definedata(data:'_index');
h1.definedone();
end;
set have1;
if h1.find() = 0 ;
sale_sum=0;
do rc=h.find() by 0 while (rc=0);
if 0 = <_t - time <=300 then do; sale_sum + _s; rc=h1.remove();end;
rc=h.find_next();
end;
drop rc _:;
run;
Good Luck,
Haikuo
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.