Help using Base SAS procedures

A Difficult Join

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 100
Accepted Solution

A Difficult Join

Hi all--

I’ve been trying a lot of different combinations of proc sql join statements but I can’t get this to work. I have two datasets a need to join.  I need to join Dataset 2 to Dataset 1  by Agency_Name and Program_name.  The new data set needs to look a certain way.

Please see the data sets below.

Data set 1
Agency_NameProgram_Namevar1var2var3
Agency AProgram 1 112
Agency AProgram 1 111
Agency AProgram 1 112
Agency AProgram 1 111
Agency AProgram 1 112
Agency AProgram 1 112
Agency AProgram 2111
Agency AProgram 2112
Agency AProgram 2112
Agency BProgram 2112
Agency BProgram 2111
Agency BProgram 3111
Agency BProgram 3112
Agency BProgram 3112
Agency BProgram 3112
Agency BProgram 3112
Agency CProgram 3112
Agency CProgram 3111
Agency CProgram 3112
Agency CProgram 3112
Agency CProgram 3112
Agency CProgram 3112
Agency CProgram 3111
Agency DProgram 1112
Agency DProgram 1112
Agency DProgram 1112
Agency DProgram 1112
Agency DProgram 1112
Agency DProgram 2112
Agency DProgram 2112
Agency DProgram 2112
Agency DProgram 2112
Agency DProgram 2112

Data set 2
Agency_NameProgram_NameQ1Q2Q3
Agency AProgram 1 112
Agency AProgram 1 111
Agency AProgram 2112
Agency AProgram 2111
Agency AProgram 2112
Agency BProgram 2112
Agency BProgram 2111
Agency BProgram 3112
Agency BProgram 3112
Agency CProgram 3112
Agency CProgram 3111
Agency CProgram 3111
Agency CProgram 3112
Agency DProgram 1 112
Agency DProgram 2112
Agency DProgram 2112

I need to join Dataset 2 to Dataset 1  by Agency_Name and Program_name.  The final product should look like this:

NEW Data set
Agency_NameProgram_Namevar1var2var3Agency_NameProgram_NameQ1Q2Q3
Agency AProgram 1 112Agency AProgram 1 112
Agency AProgram 1 111Agency AProgram 1 112
Agency AProgram 1 112.....
Agency AProgram 1 111.....
Agency AProgram 1 112.....
Agency AProgram 1 112.....
Agency AProgram 2111Agency AProgram 2111
Agency AProgram 2112Agency AProgram 2112
Agency AProgram 2112Agency AProgram 2111
Agency BProgram 2112Agency BProgram 2112
Agency BProgram 2111Agency BProgram 2111
Agency BProgram 3111Agency BProgram 3112
Agency BProgram 3112Agency BProgram 3112
Agency BProgram 3112.....
Agency BProgram 3112.....
Agency BProgram 3112.....
Agency CProgram 3112Agency CProgram 3112
Agency CProgram 3111Agency CProgram 3111
Agency CProgram 3112Agency CProgram 3111
Agency CProgram 3112Agency CProgram 3112
Agency CProgram 3112.....
Agency CProgram 3112.....
Agency CProgram 3111.....
Agency DProgram 1112Agency DProgram 1 112
Agency DProgram 1112.....
Agency DProgram 1112.....
Agency DProgram 1112.....
Agency DProgram 1112.....
Agency DProgram 2112Agency DProgram 2112
Agency DProgram 2112Agency DProgram 2112
Agency DProgram 2112.....
Agency DProgram 2112.....
Agency DProgram 2112.....

Any assistance is greatly appreciated!  Thanks!


Accepted Solutions
Solution
‎03-26-2012 11:09 PM
Respected Advisor
Posts: 3,147

Re: A Difficult Join

SQL seems mission impossible to me, while Merge can do it simple:

data have1;

input (Agency_Name Program_Name) (:&$12.)    var1    var2    var3 ;

cards;

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    2

Agency A    Program 2    1    1    1

Agency A    Program 2    1    1    2

Agency A    Program 2    1    1    2

Agency B    Program 2    1    1    2

Agency B    Program 2    1    1    1

Agency B    Program 3    1    1    1

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

;

data have2;

input (Agency_Name    Program_Name) (:&$12.)    Q1    Q2    Q3 ;

cards;

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 2    1    1    2

Agency A    Program 2    1    1    1

Agency A    Program 2    1    1    2

Agency B    Program 2    1    1    2

Agency B    Program 2    1    1    1

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    2

Agency D    Program 1     1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

;

data want;

merge have1 have2;

by     Agency_Name    Program_Name;

Agency_Name2=ifc(q1=.,'',Agency_Name);

Program_Name2=ifc(q1=.,'',Program_Name);

output;

call missing (of q1-q3);

run;

proc print;run;

Regards,

Haikuo

View solution in original post


All Replies
Super Contributor
Posts: 1,636

A Difficult Join

Do you have to use sql join or merge is OK?

Frequent Contributor
Posts: 100

A Difficult Join

Hi, I've tried both sql join and merge and I can't figure it out.  So, yes either is fine.

Solution
‎03-26-2012 11:09 PM
Respected Advisor
Posts: 3,147

Re: A Difficult Join

SQL seems mission impossible to me, while Merge can do it simple:

data have1;

input (Agency_Name Program_Name) (:&$12.)    var1    var2    var3 ;

cards;

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    2

Agency A    Program 2    1    1    1

Agency A    Program 2    1    1    2

Agency A    Program 2    1    1    2

Agency B    Program 2    1    1    2

Agency B    Program 2    1    1    1

Agency B    Program 3    1    1    1

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

;

data have2;

input (Agency_Name    Program_Name) (:&$12.)    Q1    Q2    Q3 ;

cards;

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 2    1    1    2

Agency A    Program 2    1    1    1

Agency A    Program 2    1    1    2

Agency B    Program 2    1    1    2

Agency B    Program 2    1    1    1

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    2

Agency D    Program 1     1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

;

data want;

merge have1 have2;

by     Agency_Name    Program_Name;

Agency_Name2=ifc(q1=.,'',Agency_Name);

Program_Name2=ifc(q1=.,'',Program_Name);

output;

call missing (of q1-q3);

run;

proc print;run;

Regards,

Haikuo

Frequent Contributor
Posts: 100

A Difficult Join

Thanks Hikuo. Would you be so kind though and explain to me what you did? I'm still a SAS newbie. The staments in bold I've never used before. Thanks

data want;

merge have1 have2;

by     Agency_Name    Program_Name;

Agency_Name2=ifc(q1=.,'',Agency_Name);

Program_Name2=ifc(q1=.,'',Program_Name);

output;

call missing (of q1-q3);

run;

Frequent Contributor
Posts: 100

A Difficult Join

and this

Input (Agency_Name Program_Name) (:&$12.)    var1    var2    var3 ;

Respected Advisor
Posts: 3,147

Re: A Difficult Join

Input (Agency_Name Program_Name) (:&$12.)  is short for:

Input Agency_Name :&$12. Program_Name :&$12.

& means you are reading values containing sigle blanks,

$ means you are reading character variables,

12. is the length.

: let you stop reading if there are more than one delimeters, blanks in this case.

Good night!

Haikuo

Respected Advisor
Posts: 3,147

A Difficult Join

The values from datasets will automatically be retained by default, so all you need to do is to assign missing values to the shorter dataset after exhausting their matches, that is where call missing came to play. And because you need another two identifying variables ( Agency_Name2, Program_Name2), they are matching to q1-q3 by using ifc(), which is just a conditional function, a shorter version of if -then:

Agency_Name2=ifc(q1=.,'',Agency_Name); is equavalent to:

if q1=. then Agency_Name2='';

else Agency_Name2=Agency_Name;

Check help doc for details.

Regards,

Haikuo

Super User
Posts: 9,766

Re: A Difficult Join

HaiKuo.

I am curious that you are still student ? or a worked man ?

You have so much time to response so many questions.

In my imagination, You are from Singapore or Malaysia ?

Ksharp

Frequent Contributor
Posts: 100

A Difficult Join

Wherever Haiko is from SAS newbies must be pretty grateful.

Ksharp, you’ve answered a couple questions for me too (really well),which helped out in some tricky situations and helped me learn this stuff. Thanks!

Super User
Posts: 9,766

Re: A Difficult Join

you are welcome.

My job is very easy. So I could have lots of time  to response some questions.

Ksharp

Respected Advisor
Posts: 3,147

Re: A Difficult Join

Here is a hash version:

data want;

  if _n_=1 then do;

     set have2 point=_n_;

dcl hash h(dataset:'have2', multidata:'y');

h.definekey('Agency_Name','Program_Name');

h.definedata(all:'y');

h.definedone();

  end;

  set have1;

  _n_=h.find();

  if _n_=0 then

      do;

        Agency_Name2=Agency_Name;

        Program_Name2=Program_Name;

output;

_n_=h.removedup();

end;

else do;

       call missing (of q1-q3);

       output;

end;

run;

Regards,

Haikuo

Frequent Contributor
Posts: 100

A Difficult Join

Hi  Haikuo.

Man, thanks. You seem to me to be an absolutely genius.

I don’t know hash code, again (and only if you have time)could you please explain the above to me?  I really don’t understand most of it.  Some day I promise, I’ll answer a question ortwo for you. Thanks!

Respected Advisor
Posts: 3,147

Re: A Difficult Join

Robert,

It will be very difficult for me to explain the hash code to you in a nutshell if you have not done hash before. It has different rules and syntax, but it is absolutely NOT hard to learn, it has a lot less hiden tricks than data step has, most of them are pretty straitforward ( well, except mind twisting HOH technique).

Start from the recommendations of this thread, someone else has already asked questions for you:

https://communities.sas.com/message/46591#46591

I will start from introductory hash sugi paper, then move on to more practical application papers. If you have questions, you could always go to online help doc, also the best place you already know: here.

Good luck,

Haikuo

Respected Advisor
Posts: 4,745

Re: A Difficult Join

The only way to do this kind of merge with SQL is to add some observation IDs. Then it becomes simple :

data have1;
input (Agency_Name Program_Name) (:&$12.)    var1    var2    var3 ;
if Program_Name ne lag(Program_Name) or Agency_Name ne lag(Agency_Name) then id = 0;
id + 1;
cards;
Agency A    Program 1     1    1    2
Agency A    Program 1     1    1    1
Agency A    Program 1     1    1    2
Agency A    Program 1     1    1    1
Agency A    Program 1     1    1    2
Agency A    Program 1     1    1    2
Agency A    Program 2    1    1    1
Agency A    Program 2    1    1    2
Agency A    Program 2    1    1    2
Agency B    Program 2    1    1    2
Agency B    Program 2    1    1    1
Agency B    Program 3    1    1    1
Agency B    Program 3    1    1    2
Agency B    Program 3    1    1    2
Agency B    Program 3    1    1    2
Agency B    Program 3    1    1    2
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    1
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    1
Agency D    Program 1    1    1    2
Agency D    Program 1    1    1    2
Agency D    Program 1    1    1    2
Agency D    Program 1    1    1    2
Agency D    Program 1    1    1    2
Agency D    Program 2    1    1    2
Agency D    Program 2    1    1    2
Agency D    Program 2    1    1    2
Agency D    Program 2    1    1    2
Agency D    Program 2    1    1    2
;

data have2;
input (Agency_Name    Program_Name) (:&$12.)    Q1    Q2    Q3 ;
if Program_Name ne lag(Program_Name) or Agency_Name ne lag(Agency_Name) then id = 0;
id + 1;
cards;
Agency A    Program 1     1    1    2
Agency A    Program 1     1    1    1
Agency A    Program 2    1    1    2
Agency A    Program 2    1    1    1
Agency A    Program 2    1    1    2
Agency B    Program 2    1    1    2
Agency B    Program 2    1    1    1
Agency B    Program 3    1    1    2
Agency B    Program 3    1    1    2
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    1
Agency C    Program 3    1    1    1
Agency C    Program 3    1    1    2
Agency D    Program 1     1    1    2
Agency D    Program 2    1    1    2
Agency D    Program 2    1    1    2
;

proc sql;
create table want as
select h1.Agency_Name, h1.Program_Name, h1.var1, h1.var2, h1.var3,
h2.Agency_Name as Agency_Name2, h2.Program_Name as Program_Name2, h2.q1, h2.q2, h2.q3
from have1 as h1 natural left join have2 as h2;

PG

PG
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 500 views
  • 0 likes
  • 5 in conversation