BookmarkSubscribeRSS Feed
Lucas
Calcite | Level 5

Hello,

I'm loading sorted data set into hash. I need to have data in hash exactly in the same order as in data set.

My question is: Are the observations always stored in the hash in the same order as in data set?

Because data are not sorted by the hash key, I can't use tag- ordered in DECLARE statement.    

13 REPLIES 13
data_null__
Jade | Level 19

I ran this and it suggest the answer to your question is no.  Assuming I understand the question. 

data class;

   set sashelp.class;

   by name;

   run;

proc print;

   title 'sorted by name';

   run;

data _null_;

   if 0 then set class;

   declare hash h(dataset:'class');

   h.definekey('sex','age','name');

   h.definedata(all:'YES');

   h.definedone();

   h.output(dataset:'class2');

   stop;

   run;

proc print data=class2;

   title 'not now';

   run;


Makes me wonder what you are trying to do.

Lucas
Calcite | Level 5

Thanks for response. My question was not precise. I will try ask one more time Smiley Happy

I'm loading sorted data set into hash. Data set is sorted by 'created_date' column (descending).

Hash key is on column 'phone'. Vaules in 'phone' contains duplicates.

created_date   phone

10-10-2011      607

10-10-2011      122

09-10-2011      607

08-10-2011      607

07-10-2011      122

07-10-2011      607

I want to be sure that in hash always will be phone with the latest created_date.

Because data set is large I don't want to make any additional transformations before loading it into hash.

My question was not precise
Ksharp
Super User

Just as Null said.Answer is no. Hash will not keep this order for you. you need to make a index key by yourself.

define hash ha(ordered:'a');

ha.definekey('count');

...............

do until(last);

set temp;

count+1;

ha.add();

end;

Ksharp

Lucas
Calcite | Level 5

But when I make an index key on new column 'count' I can't join on column 'phone'.

As I said before I have table with columns: created_date, phone. It's a lookup table.

created_date   phone

10-10-2011      607

10-10-2011      122

09-10-2011      607

08-10-2011      607

07-10-2011      122

07-10-2011      607

...                   ...

I have also transaction table with column phone.

phone

529

122

697

....

I want to join these two tables using hash.

Is it possible to make index key on two columns created_date + phone (then could use tag 'ordered' ), but join (with find() method) only on one column (phone).

Tom
Super User Tom
Super User

Doesn't sound like you want to make a hash that has all of the values of created_date, just the latest date for each value of phone?

Phone  Created_date

607 10-10-2011

122 10-10-2011

...

I am not that good with hash tables, but it seems like during the load of the hash from the source dataset you only want to add items when the key is new.

Ksharp
Super User

you need some special methods for multidata.  HAS_NEXT() and FIND_NEXT()  to retrieve these obs which has duplicated key.

define hash ha(ordered:'d');

ha.definekey('phone','date');

Peter_C
Rhodochrosite | Level 12

part of a response from OP was

I want to be sure that in hash always will be phone with the latest created_date.

Because data set is large I don't want to make any additional transformations before loading it into hash.

This suggests only the latest "created_date" is needed.

If so,, use the loading stage to discover and update only the latest

No sorting, no indexing, just unique by phone and holding only the latest date

ScottBass
Rhodochrosite | Level 12

Hi,

A few points:

See http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a002585310.htm, in particular:

'NO' | 'N'      Data is returned in an undefined order, for the ordered tag (NO is the default).

Also, understand the difference between the ADD() and REPLACE() methods.  You might want to load the hash object with do until (eof); set whatever end=eof; h.add() construct.

You could use two hash objects and a hash iterator to load a hash object, let the first hash object do your desired processing, and the second hash object write your output dataset in desired sort order.  You'd have to test if this gives you the perfornance you want.

HTH...

Scott


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Tom
Super User Tom
Super User

There are probably ways of doing this better, but here is one way to read your example data into a hash that has unique records and use it to lookup the date for your transaction records.

data dates;

  input date ddmmyy10. number @@;

  format date ddmmyy10.;

cards;

10-10-2011 697 10-10-2011 122 09-10-2011 607 08-10-2011 607

07-10-2011 122 07-10-2011 607

run;

data trans;

  input number @@ ;

cards;

529 122 697 607 888

run;

data lookup ;

if _n_=1 then do;

  if 0 then set dates;

  declare Hash H();

  H.DefineKey ('number');

  H.DefineData ('date');

  H.DefineDone ();   /* complete hash table definition */

  do until(eof);

    set dates end=eof;

    if h.find() then h.add();

  end;

end;

set trans;

if h.find() then date=.;

put number= date=;

run;

number=529 date=.

number=122 date=10/10/2011

number=697 date=10/10/2011

number=607 date=09/10/2011

number=888 date=.

Lucas
Calcite | Level 5

Hi,

Thank you all for responses.

@Tom's solution is very usefull but i could be done with less coding.

data dates;

  input date ddmmyy10. number @@;

  format date ddmmyy10.;

cards;

10-10-2011 697 10-10-2011 122 09-10-2011 607 08-10-2011 607

07-10-2011 122 07-10-2011 607

run;

data trans;

  input number @@ ;

cards;

529 122 697 607 888

run;

data lookup ;

if _n_=1 then do;

  if 0 then set dates;

  declare Hash H(dataset: "dates");

  H.DefineKey ('number');

  H.DefineData ('date');

  H.DefineDone ();   /* complete hash table definition */

end;

set trans;

if h.find() then date=.;

put number= date=;

run;

According to the documentation (link from @Scott Bass):

If the data set contains duplicate keys, by default, the first instance is stored in the hash object; subsequent instances are ignored. To store the last instancein the hash object, use the DUPLICATE argument tag. The DUPLICATE argumenttag also writes an error to the SAS log if there is a duplicate key.

As I understand, during loading hash the source table is reading sequentially. Sorting before loading hash should be enough. I hope Smiley Happy

Peter_C
Rhodochrosite | Level 12

Lucas

for a small amount of extra code, I was intent on avoiding

. Sorting before loading hash should be enough. I hope 


peter

Tom
Super User Tom
Super User

If the data is not sorted then you can just decide whether to replace the found value with the new one.

data lookup ;

if _n_=1 then do;

  if 0 then set dates;

  declare Hash H();

  H.DefineKey ('number');

  H.DefineData ('date');

  H.DefineDone ();   /* complete hash table definition */

  do until(eof);

    set dates(rename=(date=nextdate)) end=eof;

    if h.find() then do;

      date=nextdate; h.add();

    end;

    else if nextdate > date then do;

      date=nextdate; h.replace();

    end;

  end;

end;

set trans;

if h.find() then date=.;

put number= date=;

run;

ScottBass
Rhodochrosite | Level 12

Hi,

According to the documentation (link from @Scott Bass):

If the data set contains duplicate keys, by default, the first instance is stored in the hash object; subsequent instances are ignored. To store the last instancein the hash object, use the DUPLICATE argument tag. The DUPLICATE argumenttag also writes an error to the SAS log if there is a duplicate key.

Actually, what I meant was, you need to understand the difference between:

  declare Hash H(dataset: "dates");

  H.DefineKey ('number');

  H.DefineData ('date');

  H.DefineDone ();   /* complete hash table definition */

And (untested):

declare Hash H();

H.DefineKey ('number');

H.DefineData ('date');

do until (eof);

  set dates end=eof;

  H.Add();

end;


H.DefineDone ();   /* complete hash table definition */

And:

declare Hash H();

H.DefineKey ('number');

H.DefineData ('date');

do until (eof);

  set dates end=eof;

  H.Replace();

end;


H.DefineDone ();   /* complete hash table definition */

Assuming a sorted order of your keys, I view Add() as "keep the first occurence" and Replace() as "keep the last occurence" of the key value.  Somewhat analogous to first. and last. processing (just an analogy, not identical!)

I didn't read the finer points of the original post, so this knowledge may or may not be relevant, but it's good to know regardless.

I have sometimes used two hash objects, the first to "de-dup", or otherwise get my desired data into a hash, and the second hash object (using different keys) to return the data in desired sort order.

Regards,

Scott


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3877 views
  • 6 likes
  • 6 in conversation