I'd like to see a simpler way of creating and using hash tables...something consistent with the syntax and structure of either data statements or proc sql structure.
My suggestion...something lie this:
libname foobar memory; (kind of like memlib, but cross-platform)
and then it could be generated like this:
create foobar as select id,desc from sometable key=id;
data foobar; set sometable; keep id,desc; key=id; run;
from that point on, foobar could be used just like any other dataset:
proc sql; select store,storetable.id,foobar.desc from storetable left join foobar on storetable.id=foobar.id;
That seems much easier to use and teach than hash and it's yet-another-syntax.
While this idea is being evaluated, I wanted to reference a couple of other capabilities that readers might find interesting.
If you want to load and read a file in memory (for faster retrieval of values), check out the SASFILE statement. It has limited support for updating the file in memory. I've seen it used for SAS/SHARE-style processing where a single large data set that changes infrequently is used for many read requests.
And while I agree that the hash object syntax can be a bit confusing because of its departure from standard DATA-step-like syntax, it's similar to hash or dictionary-based collections that I've used in other programming frameworks (Java, .NET, etc.)
I think maybe a necessary step before this (or as part of this) would be for SAS to provide a way to create hash tables that persist across data steps. That would be very useful.
While the syntax of hash tables did feel "foreign" to me when I started learning it, it's no more foreign than other SAS languages were at some point (or still are): SQL, IML, DS2, macro, CASL, the list grows and grows. Learning the syntax provided a little peak into more object oriented languages, and I learned that I liked some of the features that (such as the ability to decide whether to "catch" a failing return code or let it generate an error in the log).
It's like teaching Dothraki to Klingons who know Elvish...the syntax is so different maybe it's just better to focus on the syntax they already know. 🙂
Thank you, @ChrisHemedinger, for the SASFILE info. I think that can be used instead of HASH for lookup/cross-reference/cross-walk tables. Thanks!
@ChrisHemedinger, I did test out SASFILE and found it to actually be slower (15.41 seconds vs. 14.36 seconds) in a SQL statement that used the table 4 times:
sasfile SDS.CMC_EXCD_EXPL_CD load;
FROM SDS.SV14_CLAIM_DETAILS_FINAL t1 LEFT JOIN SDS.CMC_EXCD_EXPL_CD t2 ON (t1.CLCL_EOB_EXCD_ID = t2.EXCD_ID) LEFT JOIN SDS.CMC_EXCD_EXPL_CD t5 ON (t1.CDML_EOB_EXCD = t5.EXCD_ID) LEFT JOIN SDS.FD09_DENIALREASONS t8 ON (t1.CLCL_ID = t8.CLCL_ID) AND (t1.CDML_SEQ_NO = t8.CDML_SEQ_NO) LEFT JOIN SDS.CMC_EXCD_EXPL_CD t6 ON (t8.EXCD_ID = t6.EXCD_ID) LEFT JOIN SDS.CMC_EXCD_EXPL_CD t7 ON (t1.CDML_DISALL_EXCD = t7.EXCD_ID) LEFT JOIN SDS.FD11_DIAG_AND_PROC t3 ON (t1.CLCL_ID = t3.CLCL_ID)
sasfile SDS.CMC_EXCD_EXPL_CD close;
Are there minimum thresholds for performance? That table has 2500 recs and the resulting dataset is 208,000 recs.
@tomrvincent - re: SASFILE, YMMV. Check out some papers on the topic (like this one) and see how other folks use it.
I think hash object is the undisputed champion for in-memory manipulation of SAS data in SAS 9.x. But for adjacent use cases, I see customers use a mix of hardware (SSD, etc), data prep (indexes), special engines (SPDS), and more.
But that's not really what your idea was about, right? It was more about making hash objects more approachable, syntax-wise.
I am not exactly sure which part you are objecting to about the hash references. Is it the "dot" notation for objects?
That syntax is actually similar to older SAS code, sometimes referred to as SCL from the SAS/AF and SAS/FSP products.
And SCL dates back to SAS6.12 ( or 6.08 at least that was my introduction). So hash is basically an extension though a limited on of an earlier object oriented programming using SAS.
I do sympathize with anyone attempting to teach programming to Excel users who have not ever used another programming language.
With SAS primarily being procedural (and I think PL1 may be more appropriate than COBOL as a reference) the exceptions like hash are a learning curve. Generally for look ups involving fewer than a thousand or so values I tend to either generate formats/informats if the values aren't volatile (site locations for instance).
@ChrisHemedingerthanks for the link...I like nerdy stuff like that. 🙂
Yes, my idea is to improve clarity and consistency rather than to obfuscate it...like using the word 'obfuscate'.
@tomrvincent, as a word nerd, I think 'obfuscate' is fun too. And 'concatenate,' though I've never actually used it in a sentence.
@ballardw It just seems like it could have been more consistently implemented...maybe even built into SASFILE. Wouldn't that have been simpler? When I see HASH code within a program surrounded by base SAS and proc sqls, it kind of sticks out...like it just doesn't belong.
And (assuming you meant PL/I and not PL1) I'd say COBOL is more procedural, but that's a discussion for a very different decade.
If you say it with a heavy enough Southern accent, you can get
"I went to a buffet and filled my belly! Now I concatenate no more!"
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.