BookmarkSubscribeRSS Feed

I'd like to see a simpler way of creating and using hash tables...something consistent with the syntax and structure of either data statements or proc sql structure.

 

My suggestion...something lie this:

libname foobar memory;    (kind of like memlib, but cross-platform)

 

and then it could be generated like this:

create foobar as select id,desc from sometable key=id;

-or-

data foobar; set sometable; keep id,desc; key=id; run;

 

from that point on, foobar could be used just like any other dataset:

proc sql; select store,storetable.id,foobar.desc from storetable left join foobar on storetable.id=foobar.id;

 

That seems much easier to use and teach than hash and it's yet-another-syntax.

 

11 Comments
ChrisHemedinger
Community Manager

While this idea is being evaluated, I wanted to reference a couple of other capabilities that readers might find interesting.

 

If you want to load and read a file in memory (for faster retrieval of values), check out the SASFILE statement.  It has limited support for updating the file in memory.  I've seen it used for SAS/SHARE-style processing where a single large data set that changes infrequently is used for many read requests.

 

And while I agree that the hash object syntax can be a bit confusing because of its departure from standard DATA-step-like syntax, it's similar to hash or dictionary-based collections that I've used in other programming frameworks (Java, .NET, etc.)

 

If you have SAS/IML, there are new methods for creating "lists" that work for in-memory operations too.  This nested structure reminds me of JSON a bit, which Javascript programmers would find familiar.

Quentin
Super User

I think maybe a necessary step before this (or as part of this) would be for SAS to provide a way to create hash tables that persist across data steps.  That would be very useful. 

 

While the syntax of hash tables did feel "foreign" to me when I started learning it, it's no more foreign than other SAS languages were at some point (or still are): SQL, IML, DS2, macro, CASL, the list grows and grows.  Learning the syntax provided a little peak into  more object oriented languages, and I learned that I liked some of the features that (such as the ability to decide whether to "catch" a failing return code or let it generate an error in the log).

tomrvincent
Rhodochrosite | Level 12

It certainly is similar, but SAS could have done it in COBOL or FORTRAN if 'similarity' was the goal. 🙂  I guess I would have thought 'ease of use' would be the goal.  Adding yet another syntax family to the product ..I just don't see how that helps people (especially newbies) who are struggling just to learn base SAS syntax or SQL syntax.. Adding Javascript style syntax just make things worse...unless that's the SAS corporate direction: To migrate to the growing Java/.Net world...I don't know.  I'm trying to teach SAS to 20+ people who have Excel/Access(no VBA) 'knowledge' and I'm trying to focus on SAS SQL as much as possible, avoiding Base SAS when possible and certainly don't want to toss Javascript into the mix. 

 

It's like teaching Dothraki to Klingons who know Elvish...the syntax is so different maybe it's just better to focus on the syntax they already know. 🙂

 

Thank you, @ChrisHemedinger, for the SASFILE info.  I think that can be used instead of HASH for lookup/cross-reference/cross-walk tables.  Thanks!

tomrvincent
Rhodochrosite | Level 12

@ChrisHemedinger, I did test out SASFILE and found it to actually be slower (15.41 seconds vs. 14.36 seconds) in a SQL statement that used the table 4 times:

 

sasfile SDS.CMC_EXCD_EXPL_CD load;

 

...

    FROM SDS.SV14_CLAIM_DETAILS_FINAL t1
       LEFT JOIN SDS.CMC_EXCD_EXPL_CD t2 ON (t1.CLCL_EOB_EXCD_ID = t2.EXCD_ID)
       LEFT JOIN SDS.CMC_EXCD_EXPL_CD t5 ON (t1.CDML_EOB_EXCD = t5.EXCD_ID)
       LEFT JOIN SDS.FD09_DENIALREASONS t8 ON (t1.CLCL_ID = t8.CLCL_ID) AND (t1.CDML_SEQ_NO = t8.CDML_SEQ_NO)
       LEFT JOIN SDS.CMC_EXCD_EXPL_CD t6 ON (t8.EXCD_ID = t6.EXCD_ID)
       LEFT JOIN SDS.CMC_EXCD_EXPL_CD t7 ON (t1.CDML_DISALL_EXCD = t7.EXCD_ID)
       LEFT JOIN SDS.FD11_DIAG_AND_PROC t3 ON (t1.CLCL_ID = t3.CLCL_ID)

...

 

sasfile SDS.CMC_EXCD_EXPL_CD close;

 

Are there minimum thresholds for performance?  That table has 2500 recs and the resulting dataset is 208,000 recs.

ChrisHemedinger
Community Manager

@tomrvincent - re: SASFILE, YMMV.  Check out some papers on the topic (like this one) and see how other folks use it.

 

I think hash object is the undisputed champion for in-memory manipulation of SAS data in SAS 9.x.  But for adjacent use cases, I see customers use a mix of hardware (SSD, etc), data prep (indexes), special engines (SPDS), and more.

 

But that's not really what your idea was about, right? It was more about making hash objects more approachable, syntax-wise.

ballardw
Super User

I am not exactly sure which part you are objecting to about the hash references. Is it the "dot" notation for objects?

 

That syntax is actually similar to older SAS code, sometimes referred to as SCL from the SAS/AF and SAS/FSP products.

 

And SCL dates back to SAS6.12 ( or 6.08 at least that was my introduction). So hash is basically an extension though a limited on of an earlier object oriented programming using SAS.

 

I do sympathize with anyone attempting to teach programming to Excel users who have not ever used another programming language. 

With SAS primarily being procedural (and I think PL1 may be more appropriate than COBOL as a reference) the exceptions like hash are a learning curve. Generally for look ups involving fewer than a thousand or so values I tend to either generate formats/informats if the values aren't volatile (site locations for instance).

tomrvincent
Rhodochrosite | Level 12

@ChrisHemedingerthanks for the link...I like nerdy stuff like that. 🙂

 

Yes, my idea is to improve clarity and consistency rather than to obfuscate it...like using the word 'obfuscate'.

BeverlyBrown
Community Manager

@tomrvincent, as a word nerd, I think 'obfuscate' is fun too. And 'concatenate,' though I've never actually used it in a sentence.

tomrvincent
Rhodochrosite | Level 12

@ballardw It just seems like it could have been more consistently implemented...maybe even built into SASFILE.  Wouldn't that have been simpler?  When I see HASH code within a program surrounded by base SAS and proc sqls, it kind of sticks out...like it just doesn't belong.

 

As for SCL, it provides functionality in an environment that doesn't exist in base SAS:  interactive applications (which I DEARLY wish was in regular SAS).  The HASH code just just easily have had the same syntax as base SAS since it's really just providing a table in memory and not something new like 3D animation or a VR analysis tool like Tony Stark or STARLABS uses. 

 

And (assuming you meant PL/I and not PL1) I'd say COBOL is more procedural, but that's a discussion for a very different decade.

 

 

tomrvincent
Rhodochrosite | Level 12

@BeverlyBrown

 

If you say it with a heavy enough Southern accent, you can get

 

"I went to a buffet and filled my belly!  Now I concatenate no more!"