BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
WayneBell
Calcite | Level 5

Greetings!

I have a Hashing Question:  I have two tables in a One-to-Many relationship.  I need to merge these.  I have set up the hash table for the 'One' data and match it to the 'Many' data to create the output.  The problem is that some of the data are One-to-None.  Is there any way to find the items in the 'One' data set that have no corresponding data in the 'Many' data set?  Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Hi Wayne,

I am very impressed by your novel approach, I usually consider myself a Hash user and I have used sum() method numerous times, but you did make my eyes wide-open.

Some comments: You have defined the same variable  "used" for both count index and summary, I find it is a bit confusing. I would separate them up, because they serve the different purpose and it will be easier for you to maintain your code. By doing do, you don't have to define "used" in table "one", and "used" can be any number.

Data One;

     Input Id $1. Data @@;

     /*   used=1;*/

     Cards;

A 1 B 2 C 3

;

Run;

Data Many;

     Input Id $1. Value @@;

     Cards;

A 10 A 20 A 30 A 40 A 50 C 60 C 70 C 80 C 90

;

Run;

Data Matched;

     If 0 Then

           Set One;

     If _N_=1 Then

           Do;

                Declare Hash One(Dataset:'One',Suminc:'Used');

                Declare Hiter _hi('One');

                One.Definekey('Id');

                One.Definedata('Id','Data');

                One.Definedone();

           End;

     Set Many End=Last;

     used=-100;

     If One.Find(Key:Id)=0 Then

           Do;

                /*        Used = 1;*/

                Output;

           End;

     If Last Then

           Do;

                Do _rc = _hi.First() By 0 While ( _rc = 0 );

                     _rc = One.Sum(Sum:total);

                     If total = 0 Then

                           Do;

                                Value = .;

                                Output;

                           End;

                     _rc = _hi.Next();

                End;

           End;

Run;

It was great to learn a new trick, you made my day!

Regards,

Haikuo

View solution in original post

8 REPLIES 8
Haikuo
Onyx | Level 15

If all you are interested are "one to none" then you want to switch the Hash table to "many" instead of "one". However, the "nones" can be both ways, such as there could be "many to none", and you also want to get these, then consider setting two Hash objects for both tables, then use Hiter to go through each one if them.

OT maybe, programming wise, it seems to me to be a suitable job for Proc SQL.

Regards,

Haikuo

WayneBell
Calcite | Level 5

Hello, Haikuo,

Taking the last comment first, yes, a Proc SQL with 'Left Outer Join' works perfectly.  The problem gets to be the time to do it.  I have been testing out Hashing and it runs in half the CPU time and only 20% of the elapsed time of the Proc SQL.

I am fairly new to Hashing in SAS and I will look into Hiter.  I have not seen anything (yet!) in the documentation to find which of the items in the Hash table have NOT been matched with an item in the Many table.  (This is definitely only a One-to-Many, with the Many possibly being Zero.)

I understand the process of having two hash tables and, for the second one, keeping only the non-matched items, but that hash table has the potential to be HUGE compared the One table.  The vast majority of items in the Many table will not match an item in the One table as the One table has been restricted in scope depending on the query.

Again, I have not gone through all of the documentation out there.  So, is there a way, using hiter, to set an indicator in the hash table whenever there is a match?  And then, query the hash table to find out what has NOT been matched?

Kind Regards,

Wayne

Haikuo
Onyx | Level 15

Wayne,

I agree that one of the caveat of Hash is the memory limitation, by which you tend to move the smaller table into Hash. Please take a look at the following  Hash implementation, you may find it in line with what you are after:

data one;

     input id @@;

     cards;

1 2 3

;

data many;

     input id @@;

     cards;

1 1 1 1 1 3 3 3 3

;

data _null_;

     if _n_=1 then

           do;

                declare hash one(dataset:'one');

                one.definekey('id');

                one.definedata('id');

                one.definedone();

           end;

     set many end=last;

     if one.find()=0 then

           rc=one.remove();

     if last then

           rc=one.output(dataset:'want');

run;

Good Luck,

Haikuo

WayneBell
Calcite | Level 5

Hello, Haikuo,

Thanks for the help!!  Your program gave me what was not matched, but I also need what is matched.  I merged your program with mine and arrived at the following:

Data One;

    Input Id $1. Data @@;

    Used = 1;

    Cards;

A 1 B 2 C 3

;

Run;                                                     

Data Many;

    Input Id $1. Value @@;

    Cards;

A 10 A 20 A 30 A 40 A 50 C 60 C 70 C 80 C 90

;

Run;

Data Matched;

    If 0 Then Set One;

    If _N_=1 Then

        Do;

            Declare Hash One(Dataset:'One',Suminc:'Used');

            Declare Hiter _hi('One');

            One.Definekey('Id');

            One.Definedata('Id','Data');

            One.Definedone();

        End;

    Set Many End=Last;

    If One.Find(Key:Id)=0 Then Do;

        Used = 1;

        Output;

    End;

    If Last Then Do;

        Do _rc = _hi.First() By 0 While ( _rc = 0 ) ;

            _rc = One.Sum(Sum:Used);

            If Used = 0 Then Do;

                Value = .;

                Output;

            End;

            _rc = _hi.Next();

        End;

    End;

Run;

The output contains both Matched and Non-Matched items from the One data set:

Id    Data    Value

                  

A       1       10

A       1       20

A       1       30

A       1       40

A       1       50

C       3       60

C       3       70

C       3       80

C       3       90

B       2        .

I don't know if this is the best code.  The SAS Documentations at http://support.sas.com/rnd/base/datastep/dot/hash-tip-sheet.pdf was a lot of my guide for the Suminc parameter.  Any suggestions would be appreciated!

Thanks for your help!  Without it, I would have had to stick to Proc SQL!

Kind Regards,

Wayne

Haikuo
Onyx | Level 15

Hi Wayne,

I am very impressed by your novel approach, I usually consider myself a Hash user and I have used sum() method numerous times, but you did make my eyes wide-open.

Some comments: You have defined the same variable  "used" for both count index and summary, I find it is a bit confusing. I would separate them up, because they serve the different purpose and it will be easier for you to maintain your code. By doing do, you don't have to define "used" in table "one", and "used" can be any number.

Data One;

     Input Id $1. Data @@;

     /*   used=1;*/

     Cards;

A 1 B 2 C 3

;

Run;

Data Many;

     Input Id $1. Value @@;

     Cards;

A 10 A 20 A 30 A 40 A 50 C 60 C 70 C 80 C 90

;

Run;

Data Matched;

     If 0 Then

           Set One;

     If _N_=1 Then

           Do;

                Declare Hash One(Dataset:'One',Suminc:'Used');

                Declare Hiter _hi('One');

                One.Definekey('Id');

                One.Definedata('Id','Data');

                One.Definedone();

           End;

     Set Many End=Last;

     used=-100;

     If One.Find(Key:Id)=0 Then

           Do;

                /*        Used = 1;*/

                Output;

           End;

     If Last Then

           Do;

                Do _rc = _hi.First() By 0 While ( _rc = 0 );

                     _rc = One.Sum(Sum:total);

                     If total = 0 Then

                           Do;

                                Value = .;

                                Output;

                           End;

                     _rc = _hi.Next();

                End;

           End;

Run;

It was great to learn a new trick, you made my day!

Regards,

Haikuo

Ksharp
Super User

So it would be . Why not reverse the role of these two tables ?

Data One;
     Input Id $1. Data @@;
     /*   used=1;*/
     Cards;
A 1 B 2 C 3
;
Run;
Data Many;
     Input Id $1. Value @@;
     Cards;
A 10 A 20 A 30 A 40 A 50 C 60 C 70 C 80 C 90
;
Run;
Data Matched(drop=rc);
     If _N_=1 Then Do;
                  If 0 Then Set many;
                Declare Hash many(Dataset:'Many',multidata:'y');
                many.Definekey('Id');
                many.Definedata('value');
                many.Definedone();
           End;
     Set one ;
 value=.;
 rc=many.find();
 if rc ne 0 then output;
 do while(rc =0);
    output;
     rc=many.find_next();
 end;
 run;

Xia Keshan

Haikuo
Onyx | Level 15

OP mentioned "many" is very HUGE, so there is a possibility that he is limited by his RAM.

Haikuo

WayneBell
Calcite | Level 5

And Xai Kashun,

Thanks to both of your for your great assistance.  This is my first attempt at using hashing in SAS and it has been great to learn something new.

I was not aware of the Mulitdata option in the Declare statement.  I agree with Xai Kashun that this is a more elegant solution to the problem and, in my tests, worked very well!

But, as Kaikou pointed out, the data may be too large.  Once I had a running program with a subset of data, I ran one last night with the full file.  I did not get even half of the 'Many' file into the hash table before I ran out of memory.  I can control how large the 'One' file is, so I will have to continue using the 'One' table for the hash.

Thank you both for the great information that you gave me.

Kind Regards,

Wayne

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 3032 views
  • 3 likes
  • 3 in conversation