BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mrzlatan91
Obsidian | Level 7

Hey guys, I currently have problems with my macro in terms of performance issues.

I have the following 2 Tables: 

 

data a;
infile datalines dlm='/';
input (Date cusip fundid value) ($);
datalines;

Mar2012/ 123 / A / 5$
Mar2012/ 123/ B / 6$
Apr2012 / 124 / B / 6$
May2012/ 125/ C/ 2$
Jun2012/ 123/ D/ 4$

;

 

data b;
infile datalines dlm='/';
input (Date cusip value mod) ($);
datalines;

Mar2012/ 123 / 4$ / 6$
May2012/ 125/ 3$ / 7$;

 

 

I just simply want to make a inner join of the both datasets. 

Additionally, we have the following characteristics: 

Table a is a lookup-table which means that every observation of table b exists at least once in table a. But table a 

can have observations which table b does not have.

Furthermore, there is a one-to-many-relationship: Table a can have many representations with the same cusip/date combination and table b only have unique observations.

 

So, my wanted dataset looks like this (classValue is the column mod of table b, but renamed). I just want to output all date/cusip observations of table a that match to table b and additionally add the value mod of table b (and rename it as classValue).

 

data want;
infile datalines dlm='/';
input (date cusip fundid value classValue) ($);
datalines;
Mar2012/ 123 / A / 5$ / 6$
Mar2012/ 123/ B / 6$ / 6$
May2012/ 125/ C/ 2$ / 7$

;

 

 

 

 

My current solution looks like this (both datasets are already sorted by date and cusip)

 

data want;
merge a (in=a) b(in=b keep=date cusip_hld mod rename=(mod=classValue));
by date cusip;
if a and b;
run;

 

Unfortunately, the performance is quite bad. 

I guess, since table a is a lookup-table, a hash solution could probably improve the performance a lot.

Could someone kindly provide me a hash solution for my problem?

 

Thanks in advance 🙂

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data a;
infile datalines dlm='/';
input (Date cusip fundid value) ($);
datalines;
Mar2012/ 123 / A / 5$
Mar2012/ 123/ B / 6$
Apr2012 / 124 / B / 6$
May2012/ 125/ C/ 2$
Jun2012/ 123/ D/ 4$
;

 

data b;
infile datalines dlm='/';
input (Date cusip value mod) ($);
datalines;
Mar2012/ 123 / 4$ / 6$
May2012/ 125/ 3$ / 7$
;

data want;
if _n_=1 then do;
if 0 then set b(drop=value);
 dcl hash H (dataset:'b(drop=value)',multidata:'y') ;
   h.definekey  ("Date","cusip") ;
   h.definedata (all:'y') ;
   h.definedone () ;
end;
set a;
if h.find()=0;
run;

View solution in original post

12 REPLIES 12
novinosrin
Tourmaline | Level 20

Data sets are not clear to enable easy copy paste. Can you paste verified your data values as plain text

novinosrin
Tourmaline | Level 20
data a;
infile datalines dlm='/';
input (Date cusip fundid value) ($);
datalines;
Mar2012/ 123 / A / 5$
Mar2012/ 123/ B / 6$
Apr2012 / 124 / B / 6$
May2012/ 125/ C/ 2$
Jun2012/ 123/ D/ 4$
;

 

data b;
infile datalines dlm='/';
input (Date cusip value mod) ($);
datalines;
Mar2012/ 123 / 4$ / 6$
May2012/ 125/ 3$ / 7$
;

data want;
if _n_=1 then do;
if 0 then set b(drop=value);
 dcl hash H (dataset:'b(drop=value)',multidata:'y') ;
   h.definekey  ("Date","cusip") ;
   h.definedata (all:'y') ;
   h.definedone () ;
end;
set a;
if h.find()=0;
run;
mrzlatan91
Obsidian | Level 7

Sry mate for the bad representation of my data. Thank you sooooooooooooo much for your quick help. 

I´m so excited about how much the performance can be improved with that. 

You always help me, how can I support you for that? 

 

 

 

For completeness:

 

 

data a;
input (Date cusip fundid value) ($);
datalines;
Mar2012 123 A 5$
Mar2012 123 B 6$
Apr2012 124 B 6$
May2012 125 C 2$
Jun2012 123 D 4$
;

 

data b;
input (Date cusip value mod) ($);
datalines;
Mar2012 123 4$ 6$
May2012 125 3$ 7$
;

data want;
input (Date cusip fundid value classValue) ($);
datalines;
Mar2012 123 A 5$ 6$
Mar2012 123 B 6$ 6$
May2012 125 C 2$ 7$
;

 

novinosrin
Tourmaline | Level 20

We are here to share knowledge, so no need to thank or be formal. I always enjoyed your enthuse. The same code works for your recent data above

 

data want;
if _n_=1 then do;
if 0 then set b(drop=value);
dcl hash H (dataset:'b(drop=value)',multidata:'y') ;
h.definekey ("Date","cusip") ;
h.definedata (all:'y') ;
h.definedone () ;
end;
set a;
if h.find()=0;
run;

 

Also @mrzlatan91 Someday we shall have  paulaner dunkel dark beer at a nice pub. 🙂  Have fun. 

mrzlatan91
Obsidian | Level 7

@novinosrinYeah, this would be nice.

Tell me when you´re in Germany 🙂

SuryaKiran
Meteorite | Level 14

You may also need to consider memory size when using Hash tables. If your table is very large then you might encounter insufficient memory for hash table processing, you might need to subset the dataset in this case.

Thanks,
Suryakiran
mrzlatan91
Obsidian | Level 7

@SuryaKiran thanks for your advice. Is there a way to find out how much memory is used and available?

SuryaKiran
Meteorite | Level 14

options fullstimer; --> This will put some stats in your log. and to find your system memory available use proc options option=memory;run; or  in a macro %let memsize=%sysfunc(getoption(MEMSIZE));

Thanks,
Suryakiran
mrzlatan91
Obsidian | Level 7

Thank you,

I have tested it now. Unfortunately, the Hash solution is slower than the merge solution.

I´m testing it on a working station and dont have any memory problems (max. used memory is about 20000.00k, OS memory is about 400000.00k). I dont really understand why the hash solution is slower.

 

Is there another possibility to improve the performance (e.g. using more memory) ?

I have already loaded one of the datasets in the memory (via sasfile).

novinosrin
Tourmaline | Level 20

Can indexed merge help?

 

data a;
infile datalines dlm='/';
input (Date cusip fundid value) ($);
datalines;
Mar2012/ 123 / A / 5$
Mar2012/ 123/ B / 6$
Apr2012 / 124 / B / 6$
May2012/ 125/ C/ 2$
Jun2012/ 123/ D/ 4$
;

 

data b;
infile datalines dlm='/';
input (Date cusip value mod) ($);
datalines;
Mar2012/ 123 / 4$ / 6$
May2012/ 125/ 3$ / 7$
;


proc datasets library=work nolist;
 modify b;
 index create Date cusip / unique;
 modify a;
 index create keyname=(Date cusip);
quit;

data want;
 merge  a (in=a) b(in=b keep=date cusip mod rename=(mod=classValue));
 by Date cusip;
 if a and b;
 run;


mrzlatan91
Obsidian | Level 7

@novinosrinsry for the late answer.

I get an error message that double values for date / cusip are not valid in table b. But Its not possible that I have double values in it.

I have also tested 2 different SQL statements with select date, cusip vS. select distinct date, cusip and get the same amount of observations.

 

 

Andygray
Quartz | Level 8

@mrzlatan91 if the thread has been answered, you can mark the question as answered 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1569 views
  • 3 likes
  • 4 in conversation