BookmarkSubscribeRSS Feed
Tom
Super User Tom
Super User

Why not just use a FORMAT instead?  That is essentially a hash table resident in memory and does not require that you attempt to convert SAS into some other programming language.

 

jimbarbour
Meteorite | Level 14

Hi, @Tom,

 

Do you mean something like the following?

proc	format;
	VALUE	$Person_ID
		'57kk4ja28',
		'52jj2ay56',
		'46dq1bb94',
		'22ay7dj19',
		'76du8hj71'	=	'INCLUDE'
		;
RUN;

DATA	Prod47.Scoreable_Consumers;
	SET	Prod00.All_Consumers;
		WHERE	PUT(Person_ID, $Person_ID.)	=	'INCLUDE';
RUN;

That would work. 

 

Typically these key lists number in the 100's of thousands.  It would be interesting to see if it were actually as fast as hash tables which are impressively fast.  

 

These list of ID's come down from a mainframe and are stored in a particular directory on a UNIX box.  It's so easy to feed the keys into a hash table direct from the raw file in a single step.  With a Proc Format, I guess I'd have to read the keys in as text, write them out as Proc Format statements, run the output as a SAS program, storing the resultant Format in a standard format lib, and then run.  A little more work, but I guess that's not so bad.

 

Jim

 

Tom
Super User Tom
Super User

No need to write source code to create a format from data. Just make a dataset in the right structure to use PROC FORMAT with the CNTLIN= option.

 

Also did you try using the SASFILE statement with a dataset with an index?  That will also let you hold the dataset in memory.The SASFILE statement will keep the dataset in memory until you close it.

data gender (index=(name)) ;
  set sashelp.class (keep=name sex);
run;

sasfile gender load ;

data want ;
  set sashelp.class (drop=sex);
  set gender key=name ;
  if _iorc_=0 ;
run;

data second_step;
 ....
run;

sasfile gender close;

proc compare data=want compare=sashelp.class; run;

 

jimbarbour
Meteorite | Level 14

@Tom, ah!  Now, that is excellent.  I haven't seen the "sasfile XX load" used before.  If a dataset persists in memory until it is closed, that might be a bit of a problem if a forgetful programmer doesn't code a close statement.  Presumably SAS has a way to unload a dataset from memory when it is no longer needed, perhaps at EOJ.  

 

It would be interesting to see if a sasfile load can persist not only across DATA step boundaries but also across separate jobs or threads.  I'll have to play with this one.  Thank you very much.

 

Jim

ChrisNZ
Tourmaline | Level 20

SASFILE loads tables in memory and works well with indexes. Have you tried?

 

[Edit: I see @Tomhas already suggested SASFILE.

To answer your subsequent question: SASFILE tables are not usable across SAS sessions. 

The only way to do this is use RAM disks. Even MEMLIB does not allow that. ]

jimbarbour
Meteorite | Level 14

@ChrisNZ,

 

I'm typically running on either UNIX (AIX) or LINUX, so I don't think I can make use of a MEMLIB... although there's something nagging at the back of my brain that one of the 9.x releases introduced... something ...that has some similarity to a MEMLIB that will work in UNIX.  I think.  Don't quote me on that.  Relying on my memory, well, you wouldn't want to do that.

 

The SASFILE, though, now that's worth exploring.  I know how to get a SAS dataset to persist in memory (either through a macro or some type of synchronous multi-threading) such that the data will be resident during the duration of other sessions, but will it be addressable?  In other words, if I load a SAS data set in Session A and cause it to persist throughout the life of Session B, will Session B be able to make use of the memory-loaded dataset or must I, like a hash table, create a separate instance?  I'll have to play with that. 

 

Most intriguing.

 

Jim

KachiM
Rhodochrosite | Level 12

Jim,

 

You said:

 

"We have a series of jobs that pull off designated records by key. The idea would be to load the hash table and then have the different jobs all use the same hash table for the key lookup. ....."

 

The problem will be easier with FCMP HASH if you can collect all  records from several JOBS in one or more text files. These text files need be changed to SAS Data Sets. At the end of all Jobs, you can run the program similar to one I  posted earlier. You can have two output Data Sets one for Matched and the other for UnMatched. Will you buy this idea ?

 

Regards.

DATASP

jimbarbour
Meteorite | Level 14

@KachiM,

 

Your idea of pulling off all the records first and then running a single SAS data set though the hash file key look up routine would probably work, but it would mean a) that we'd have to totally redesign our processes and b) that we'd have to process hundreds of millions of records per run unnecessarily   We want to pull just the records we need in the very first step.

 

I think the SASFILE LOAD idea that @Tom and @ChrisNZ have mentioned shows the most promise.  I'll look into it further as time permits.  For now, we have our hash table process working; it just has to be loaded for each DATA step.  Each load is only a minute or two, which, out of a 3 hour run, isn't all that material.  I was intrigued by Mr. Carpenter's SGF 2018 paper, but it's not absolutely critical as a practical matter that we change our processes.

 

Jim

KachiM
Rhodochrosite | Level 12

Jim,

 

If SAS FILE approach is helpful, it is most welcome. My idea is only a last resort. It is not necessary to combine all files together. Individual data sets can also be fed but all such files must be ready before HASH program is used.

 

DATASP

 

ChrisNZ
Tourmaline | Level 20

Yes, MEMLIB (and MEMCACHE) were windows-only features last time I looked, and all these memory-loading methods (hash and SASFILE included) are session-specific.

 

I suppose you know about this? http://www-01.ibm.com/support/docview.wss?uid=isg3T1010722.

That's the only way I can think of to share memory-based data across sessions.

jimbarbour
Meteorite | Level 14

@ChrisNZ,

 

Actually, no, I wasn't aware of that.  I haven't spent a lot of time looking into memory solutions (outside of SAS) because our UNIX Admin types won't let us anywhere near the disk arrays or memory configuration.  However, this memory loading is something good to have in the back of one's mind in case one were to find oneself in a smaller or less structured shop.

 

Thanks,

 

Jim

ChrisNZ
Tourmaline | Level 20

It's not for you to set up RAM disks, but the Unix admins might be able to -provided they agree the server has a bit of RAM to spare for this usage-. You can always ask and get the ball rolling.

 

jimbarbour
Meteorite | Level 14

@ChrisNZ@KachiM@Quentin, and @ballardw:

 

I went to the Western SAS conference last week (WUSS).  Art Carpenter presented a paper that solves the problem of accessing hash tables across Data step boundaries.  It's a very clever use of the RESOLVE() function inside the Data step. 

 

Please see http://wuss18.org/wp-content/uploads/accepted2018/41_Final_Paper_PDF.pdf if you're interested.

 

Jim

ChrisNZ
Tourmaline | Level 20

@jimbarbour Thank you very much indeed. So clever!

 

This technique reminds me of the contortions we had to do in order to run code inside code via proc fcmp and run_macro() before call dosubl() arrived. 

 

The boundaries have now been pushed... 

 

 

Quentin
Super User

Yes thanks @jimbarbour for passing that along.  I would never have thought to re-read the WUSS version of the paper.  Great that Art managed to crack this riddle.  

 

Agree @ChrisNZ, it does have echoes of Mike Rhoads's 'macro function sandwich' approach.  Happily, shortly after I started playing with that approach, SAS gave use DOSUBL which made life a lot easier.  My guess is it won't be too long before we are given the ability to manage persistent hash tables across data steps (M6?).  But cool to see what can be done now, even if it requires jumping through a few hoops.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 31 replies
  • 2080 views
  • 6 likes
  • 6 in conversation