BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vxhong17
Calcite | Level 5

Hi All,

Two products A and B are sold in the US and Canada. I want to sum of sales with the same price within 5 minutes for each market whenever there are sales with the same price in each market. I assign the time of the first sale as the time for output. For example, product A. the price is 5 (6) from 10:05 to 10:09 in US (Canada) market. Thus, I will sum of all sales in this period for each market. Sale for the US  is 7 + 7 + 6 + 4 +7 (31) and Sale for Canada is 8 + 5 (13). The first sale in the US (Canada) is a10:05 (10:06). Thus, I have the first two rows in the output. I do similar whenever there are sales within 5 minutes in each market.

Could you show me how to deal with this matter?

Thank you so much,

Input:

timeProduct salePriceMarket
10:05A75US
10:06A86Canada
10:06A75US
10:07A65US
10:08A56Canada
10:08A45US
10:09A75US
10:12A125US
10:12A85Canada
10:13A105Canada
10:13A95US
10:13A106Canada
10:14A116US
10:15A126US
10:16A136US
10:01B2010US
10:02B1510Canada
10:02B2210US
10:03B2410US
10:03B2710Canada
10:04B2810US
10:05B2210US
10:07B2110US
10:08B1910Canada
10:08B2010US
10:09B2311US
10:10B2411Canada
10:12B511Canada

Output

time Productsaleprice market
10:05A315US=7+7+6=4+7
10:06A136Canada=8+5
10:12A215US=12+9
10:12A185Canada=8+10
10:13A106Canada
10:14A366US=11+12+13
10:01B11610US
10:02B4210Canada
10:07B4110US
10:08B1910Canada
10:09B2311US
10:10B2911Canada
1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

A bit late for the party, was actually working hard.The moment I saw the title, the idea of using Hash() starts to tickle me none stop. Here is my approach, hopefully more fitting into the original request.

data have;

input time time5. Product $ sale Price Market$;

format time time5.;

cards;

10:05 A 7 5 US

10:06 A 8 6 Canada

10:06 A 7 5 US

10:07 A 6 5 US

10:08 A 5 6 Canada

10:08 A 4 5 US

10:09 A 7 5 US

10:12 A 12 5 US

10:12 A 8 5 Canada

10:13 A 10 5 Canada

10:13 A 9 5 US

10:13 A 10 6 Canada

10:14 A 11 6 US

10:15 A 12 6 US

10:16 A 13 6 US

10:01 B 20 10 US

10:02 B 15 10 Canada

10:02 B 22 10 US

10:03 B 24 10 US

10:03 B 27 10 Canada

10:04 B 28 10 US

10:05 B 22 10 US

10:07 B 21 10 US

10:08 B 19 10 Canada

10:08 B 20 10 US

10:09 B 23 11 US

10:10 B 24 11 Canada

10:12 B 5 11 Canada

;

data want;

  if _n_=1 then do;

    if 0 then set have(rename=(time=_t sale=_s));

    declare hash h(dataset:'have(rename=(time=_t sale=_s))', multidata:'y', ordered:'y');

  1. h.definekey('product', 'price', 'market');
  2. h.definedata(all:'y');
  3. h.definedone();

call missing(_t,_s);

declare hash h1(dataset:'have', multidata:'y', ordered:'y');

h1.definekey('product', 'price', 'market','time','sale');

h1.definedata(all:'y');

h1.definedone();

end;

set have;

  if h1.find() = 0 ;

    sale_sum=0;

     do rc=h.find() by 0 while (rc=0);

          if 0 = <_t - time <=300 then do; sale_sum + _s; rc=h1.remove(key:product, key:price, key:market,key:_t,key:_s);end;

          rc=h.find_next();

     end;

drop rc _:;    

run;


Haikuo

View solution in original post

13 REPLIES 13
PGStats
Opal | Level 21

This is not exactly what you asked for. I think you would get a fairer comparison if you used the same time periods for both markets. Here is how to get total sales and mean price for each 5 minute period when sales occured on both markets :

data have;
input time :time5. Product $ sale Price Market$;
format time time5.;
datalines;
10:05 A 7 5 US
10:06 A 8 6 Canada
10:06 A 7 5 US
10:07 A 6 5 US
10:08 A 5 6 Canada
10:08 A 4 5 US
10:09 A 7 5 US
10:12 A 12 5 US
10:12 A 8 5 Canada
10:13 A 10 5 Canada
10:13 A 9 5 US
10:13 A 10 6 Canada
10:14 A 11 6 US
10:15 A 12 6 US
10:16 A 13 6 US
10:01 B 20 10 US
10:02 B 15 10 Canada
10:02 B 22 10 US
10:03 B 24 10 US
10:03 B 27 10 Canada
10:04 B 28 10 US
10:05 B 22 10 US
10:07 B 21 10 US
10:08 B 19 10 Canada
10:08 B 20 10 US
10:09 B 23 11 US
10:10 B 24 11 Canada
10:12 B 5 11 Canada
;

proc sql;
create table CanOrUS5min as
select
     product,
     intnx("MINUTE5",time,0) as time5 format=time5.,
     market,
     sum(sale) as totalSale,
     mean(price) as meanPrice
from have
group by product, calculated time5, market;


create table CanAndUS5min as
select
     product,
     time5,
     market,
     totalSale,
     meanPrice
from CanOrUS5min
group by product, time5
having count(market)=2
order by product, time5, market;


drop table CanOrUS5min;

select * from CanAndUS5min;
quit;

hth

PG

PG
CharlotteCain
Quartz | Level 8

Hi PG sir,

Subject: Out of context to this discussion /*forgive me*/

Your solutions at the supersonic speed at which you seem to write codes really amazes me. It also inspires me to become as great as you. I have noticed on these forums you have been applauded many many times along the likes of Art, haikuo, tomkari et al so yet another praise from is nothing perhaps new to you.

However for me, I am not from a conventional academic background nor super smart like you. By random chance, I got exposed to SAS and i am slowly learning and progressing. Would you suggest me how to pick up speed, particularly like familiarizing with those tons of functions and call routines like when and where to use/apply them.

Right now, I am extensively researching and learning on sugi and support.sas. and few other books. However it doesn't seem adequate. Can you guide me a few tips in speedy learning. I may not be smart nor academic, but i dont wanna give up.

Your response would be most appreciated. Thanks

Charlotte from England

PGStats
Opal | Level 21

Thank you Charlotte for the good words.

The good news is that you already got as far as you did. I met many who tried to learn SAS by themselves and just don't get it; they never progress. You are among the special few with the motivation and talent to succeed.

The bad news is that learning SAS by yourself is a long process. I have been at it, on and off, since 1989! So you understand speedy learning is not my specialty. I never took a course; I progressed, albeit slowly, by solving real problems (mine or other's) in the course of my work. I learned a lot lately from the discussion forum, it made me try many new things like proc fcmp, hash objects and DOW loops. For me, it is a form of continuing education.

I hereby call on all SAS teachers (, , , and others?) for help on self learning material.

Best of luck.

PG de Montréal

PG
CharlotteCain
Quartz | Level 8

Merci beaucoup PG Sir,

Noticed Montreal Smiley Happy Je fais l'amour à Montréal et au Canada. Although English girl through and through, love the french too. I don't mind exchanging Traditional English Fish & Chips for a nice French wine. I guess the same duplicates applies in Quebec and Anglophone Canada.

Passez une bonne journée,

Charlotte de Liverpool, England

Haikuo
Onyx | Level 15

A bit late for the party, was actually working hard.The moment I saw the title, the idea of using Hash() starts to tickle me none stop. Here is my approach, hopefully more fitting into the original request.

data have;

input time time5. Product $ sale Price Market$;

format time time5.;

cards;

10:05 A 7 5 US

10:06 A 8 6 Canada

10:06 A 7 5 US

10:07 A 6 5 US

10:08 A 5 6 Canada

10:08 A 4 5 US

10:09 A 7 5 US

10:12 A 12 5 US

10:12 A 8 5 Canada

10:13 A 10 5 Canada

10:13 A 9 5 US

10:13 A 10 6 Canada

10:14 A 11 6 US

10:15 A 12 6 US

10:16 A 13 6 US

10:01 B 20 10 US

10:02 B 15 10 Canada

10:02 B 22 10 US

10:03 B 24 10 US

10:03 B 27 10 Canada

10:04 B 28 10 US

10:05 B 22 10 US

10:07 B 21 10 US

10:08 B 19 10 Canada

10:08 B 20 10 US

10:09 B 23 11 US

10:10 B 24 11 Canada

10:12 B 5 11 Canada

;

data want;

  if _n_=1 then do;

    if 0 then set have(rename=(time=_t sale=_s));

    declare hash h(dataset:'have(rename=(time=_t sale=_s))', multidata:'y', ordered:'y');

  1. h.definekey('product', 'price', 'market');
  2. h.definedata(all:'y');
  3. h.definedone();

call missing(_t,_s);

declare hash h1(dataset:'have', multidata:'y', ordered:'y');

h1.definekey('product', 'price', 'market','time','sale');

h1.definedata(all:'y');

h1.definedone();

end;

set have;

  if h1.find() = 0 ;

    sale_sum=0;

     do rc=h.find() by 0 while (rc=0);

          if 0 = <_t - time <=300 then do; sale_sum + _s; rc=h1.remove(key:product, key:price, key:market,key:_t,key:_s);end;

          rc=h.find_next();

     end;

drop rc _:;    

run;


Haikuo

vxhong17
Calcite | Level 5

Hi PGStats,

Thank you so much for your suggestion,

Hi Haikuo,

I was still waiting for the help. Thank you so much for your precious help,

vxhong17
Calcite | Level 5

Hi Haikuo,

Thank you so much for your sample code. It worked perfectly for a small sample. However, when I worked with a very large sample, SAS reported insufficient memory.

Could you have any suggestion?

Thank you so much,

Haikuo
Onyx | Level 15

Ouch! I hear people say, Hash does not have limit, but RAM is the limit. So besides reaching for your pocket book and buying more RAMs, how does your real data look like, does it have only 5 variables or more? How many obs you have? How much RAM you have? Do you have a unique index variable to identify obs?

Haikuo

vxhong17
Calcite | Level 5

Hi Haikuo,

My RAM is 8GB. I have 7 variables. I work on many files. Each is from 2GB to 4GB. I do not have a unique ID and have to use several ones.

Thank you so much,

AhmedAl_Attar
Rhodochrosite | Level 12

Hi vxhong17,

I would recommend doing the following:

1. Find out how much memory is allocate to your SAS session ----> proc options group=memory; run;

2. Specify higher internal hash table size, by specifying higher hashexp: value. Check the online help below

hashexp: n

The hash object's internal table size, where the size of the hash table is 2n.

The value of HASHEXP is used as a power-of-two exponent to create the hash table size. For example, a value of 4 for HASHEXP equates to a hash table size of 24, or 16. The maximum value for HASHEXP is 20.

The hash table size is not equal to the number of items that can be stored. Imagine the hash table as an array of 'buckets.' A hash table size of 16 would have 16 'buckets.' Each bucket can hold an infinite number of items. The efficiency of the hash table lies in the ability of the hashing function to map items to and retrieve items from the buckets.

You should specify the hash table size relative to the amount of data in the hash object in order to maximize the efficiency of the hash object lookup routines. Try different HASHEXP values until you get the best result. For example, if the hash object contains one million items, a hash table size of 16 (HASHEXP = 4) would work, but not very efficiently. A hash table size of 512 or 1024 (HASHEXP = 9 or 10) would result in the best performance.

Default:8, which equates to a hash table size of 28 or 256
Haikuo
Onyx | Level 15

OP's issue is the tables are too large to fit into the RAM, and Hash objects will chew up whatever available in the RAM, Hashexp setting has NO effect on that. After loading data into RAM successfully done, then Hashexp will determine how the mapping works, therefore affects the efficiency.

Haikuo

vxhong17
Calcite | Level 5

Hi AhmedAl-Attar,

Thank you for your suggestion.

Hi Haikuo,

I just do one by one.

It works better now. Thank you so much.

Haikuo
Onyx | Level 15

One question and one suggestion:

Question: What you do you mean by "work on many files"? Are you doing do it in parallel (multiple SAS session at the same time) or you just do it one file at a time? I hope it is the latter, otherwise, you need to opt to the latter.

Suggestion:

You probably have to do it more than one pass, in the first pass, an unique identifier is created:

data have1;

  set have;

_index=_n_;

run;

After you have done that, here is the modified code, it will save roughly  half of the RAM needed than previous version, try it out and let us know how it works.

data have;

input time time5. Product $ sale Price Market$;

format time time5.;

cards;

10:05 A 7 5 US

10:06 A 8 6 Canada

10:06 A 7 5 US

10:07 A 6 5 US

10:08 A 5 6 Canada

10:08 A 4 5 US

10:09 A 7 5 US

10:12 A 12 5 US

10:12 A 8 5 Canada

10:13 A 10 5 Canada

10:13 A 9 5 US

10:13 A 10 6 Canada

10:14 A 11 6 US

10:15 A 12 6 US

10:16 A 13 6 US

10:01 B 20 10 US

10:02 B 15 10 Canada

10:02 B 22 10 US

10:03 B 24 10 US

10:03 B 27 10 Canada

10:04 B 28 10 US

10:05 B 22 10 US

10:07 B 21 10 US

10:08 B 19 10 Canada

10:08 B 20 10 US

10:09 B 23 11 US

10:10 B 24 11 Canada

10:12 B 5 11 Canada

;

data have1;

set have;

_index=_n_;

run;

data want;

  if _n_=1 then do;

    declare hash h(dataset:'have1(keep=_index time Product sale Price Market rename=(time=_t sale=_s))', multidata:'y', ordered:'y');

  1. h.definekey('product', 'price', 'market');
  2. h.definedata(all:'y');
  3. h.definedone();

call missing(_t,_s);

declare hash h1(dataset:'have1(keep=_index)');

h1.definekey('_index');

h1.definedata(data:'_index');

h1.definedone();

end;

set have1;

  if h1.find() = 0 ;

    sale_sum=0;

     do rc=h.find() by 0 while (rc=0);

          if 0 = <_t - time <=300 then do; sale_sum + _s; rc=h1.remove();end;

          rc=h.find_next();

     end;

drop rc _:;    

run;


Good Luck,

Haikuo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1265 views
  • 5 likes
  • 5 in conversation