BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RobertNYC
Obsidian | Level 7

Hi all--

I’ve been trying a lot of different combinations of proc sql join statements but I can’t get this to work. I have two datasets a need to join.  I need to join Dataset 2 to Dataset 1  by Agency_Name and Program_name.  The new data set needs to look a certain way.

Please see the data sets below.

Data set 1
Agency_NameProgram_Namevar1var2var3
Agency AProgram 1 112
Agency AProgram 1 111
Agency AProgram 1 112
Agency AProgram 1 111
Agency AProgram 1 112
Agency AProgram 1 112
Agency AProgram 2111
Agency AProgram 2112
Agency AProgram 2112
Agency BProgram 2112
Agency BProgram 2111
Agency BProgram 3111
Agency BProgram 3112
Agency BProgram 3112
Agency BProgram 3112
Agency BProgram 3112
Agency CProgram 3112
Agency CProgram 3111
Agency CProgram 3112
Agency CProgram 3112
Agency CProgram 3112
Agency CProgram 3112
Agency CProgram 3111
Agency DProgram 1112
Agency DProgram 1112
Agency DProgram 1112
Agency DProgram 1112
Agency DProgram 1112
Agency DProgram 2112
Agency DProgram 2112
Agency DProgram 2112
Agency DProgram 2112
Agency DProgram 2112

Data set 2
Agency_NameProgram_NameQ1Q2Q3
Agency AProgram 1 112
Agency AProgram 1 111
Agency AProgram 2112
Agency AProgram 2111
Agency AProgram 2112
Agency BProgram 2112
Agency BProgram 2111
Agency BProgram 3112
Agency BProgram 3112
Agency CProgram 3112
Agency CProgram 3111
Agency CProgram 3111
Agency CProgram 3112
Agency DProgram 1 112
Agency DProgram 2112
Agency DProgram 2112

I need to join Dataset 2 to Dataset 1  by Agency_Name and Program_name.  The final product should look like this:

NEW Data set
Agency_NameProgram_Namevar1var2var3Agency_NameProgram_NameQ1Q2Q3
Agency AProgram 1 112Agency AProgram 1 112
Agency AProgram 1 111Agency AProgram 1 112
Agency AProgram 1 112.....
Agency AProgram 1 111.....
Agency AProgram 1 112.....
Agency AProgram 1 112.....
Agency AProgram 2111Agency AProgram 2111
Agency AProgram 2112Agency AProgram 2112
Agency AProgram 2112Agency AProgram 2111
Agency BProgram 2112Agency BProgram 2112
Agency BProgram 2111Agency BProgram 2111
Agency BProgram 3111Agency BProgram 3112
Agency BProgram 3112Agency BProgram 3112
Agency BProgram 3112.....
Agency BProgram 3112.....
Agency BProgram 3112.....
Agency CProgram 3112Agency CProgram 3112
Agency CProgram 3111Agency CProgram 3111
Agency CProgram 3112Agency CProgram 3111
Agency CProgram 3112Agency CProgram 3112
Agency CProgram 3112.....
Agency CProgram 3112.....
Agency CProgram 3111.....
Agency DProgram 1112Agency DProgram 1 112
Agency DProgram 1112.....
Agency DProgram 1112.....
Agency DProgram 1112.....
Agency DProgram 1112.....
Agency DProgram 2112Agency DProgram 2112
Agency DProgram 2112Agency DProgram 2112
Agency DProgram 2112.....
Agency DProgram 2112.....
Agency DProgram 2112.....

Any assistance is greatly appreciated!  Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

SQL seems mission impossible to me, while Merge can do it simple:

data have1;

input (Agency_Name Program_Name) (:&$12.)    var1    var2    var3 ;

cards;

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    2

Agency A    Program 2    1    1    1

Agency A    Program 2    1    1    2

Agency A    Program 2    1    1    2

Agency B    Program 2    1    1    2

Agency B    Program 2    1    1    1

Agency B    Program 3    1    1    1

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

;

data have2;

input (Agency_Name    Program_Name) (:&$12.)    Q1    Q2    Q3 ;

cards;

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 2    1    1    2

Agency A    Program 2    1    1    1

Agency A    Program 2    1    1    2

Agency B    Program 2    1    1    2

Agency B    Program 2    1    1    1

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    2

Agency D    Program 1     1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

;

data want;

merge have1 have2;

by     Agency_Name    Program_Name;

Agency_Name2=ifc(q1=.,'',Agency_Name);

Program_Name2=ifc(q1=.,'',Program_Name);

output;

call missing (of q1-q3);

run;

proc print;run;

Regards,

Haikuo

View solution in original post

14 REPLIES 14
Linlin
Lapis Lazuli | Level 10

Do you have to use sql join or merge is OK?

RobertNYC
Obsidian | Level 7

Hi, I've tried both sql join and merge and I can't figure it out.  So, yes either is fine.

Haikuo
Onyx | Level 15

SQL seems mission impossible to me, while Merge can do it simple:

data have1;

input (Agency_Name Program_Name) (:&$12.)    var1    var2    var3 ;

cards;

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    2

Agency A    Program 2    1    1    1

Agency A    Program 2    1    1    2

Agency A    Program 2    1    1    2

Agency B    Program 2    1    1    2

Agency B    Program 2    1    1    1

Agency B    Program 3    1    1    1

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 1    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

;

data have2;

input (Agency_Name    Program_Name) (:&$12.)    Q1    Q2    Q3 ;

cards;

Agency A    Program 1     1    1    2

Agency A    Program 1     1    1    1

Agency A    Program 2    1    1    2

Agency A    Program 2    1    1    1

Agency A    Program 2    1    1    2

Agency B    Program 2    1    1    2

Agency B    Program 2    1    1    1

Agency B    Program 3    1    1    2

Agency B    Program 3    1    1    2

Agency C    Program 3    1    1    2

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    1

Agency C    Program 3    1    1    2

Agency D    Program 1     1    1    2

Agency D    Program 2    1    1    2

Agency D    Program 2    1    1    2

;

data want;

merge have1 have2;

by     Agency_Name    Program_Name;

Agency_Name2=ifc(q1=.,'',Agency_Name);

Program_Name2=ifc(q1=.,'',Program_Name);

output;

call missing (of q1-q3);

run;

proc print;run;

Regards,

Haikuo

RobertNYC
Obsidian | Level 7

Thanks Hikuo. Would you be so kind though and explain to me what you did? I'm still a SAS newbie. The staments in bold I've never used before. Thanks

data want;

merge have1 have2;

by     Agency_Name    Program_Name;

Agency_Name2=ifc(q1=.,'',Agency_Name);

Program_Name2=ifc(q1=.,'',Program_Name);

output;

call missing (of q1-q3);

run;

RobertNYC
Obsidian | Level 7

and this

Input (Agency_Name Program_Name) (:&$12.)    var1    var2    var3 ;

Haikuo
Onyx | Level 15

Input (Agency_Name Program_Name) (:&$12.)  is short for:

Input Agency_Name :&$12. Program_Name :&$12.

& means you are reading values containing sigle blanks,

$ means you are reading character variables,

12. is the length.

: let you stop reading if there are more than one delimeters, blanks in this case.

Good night!

Haikuo

Haikuo
Onyx | Level 15

The values from datasets will automatically be retained by default, so all you need to do is to assign missing values to the shorter dataset after exhausting their matches, that is where call missing came to play. And because you need another two identifying variables ( Agency_Name2, Program_Name2), they are matching to q1-q3 by using ifc(), which is just a conditional function, a shorter version of if -then:

Agency_Name2=ifc(q1=.,'',Agency_Name); is equavalent to:

if q1=. then Agency_Name2='';

else Agency_Name2=Agency_Name;

Check help doc for details.

Regards,

Haikuo

Ksharp
Super User

HaiKuo.

I am curious that you are still student ? or a worked man ?

You have so much time to response so many questions.

In my imagination, You are from Singapore or Malaysia ?

Ksharp

RobertNYC
Obsidian | Level 7

Wherever Haiko is from SAS newbies must be pretty grateful.

Ksharp, you’ve answered a couple questions for me too (really well),which helped out in some tricky situations and helped me learn this stuff. Thanks!

Ksharp
Super User

you are welcome.

My job is very easy. So I could have lots of time  to response some questions.

Ksharp

Haikuo
Onyx | Level 15

Here is a hash version:

data want;

  if _n_=1 then do;

     set have2 point=_n_;

dcl hash h(dataset:'have2', multidata:'y');

h.definekey('Agency_Name','Program_Name');

h.definedata(all:'y');

h.definedone();

  end;

  set have1;

  _n_=h.find();

  if _n_=0 then

      do;

        Agency_Name2=Agency_Name;

        Program_Name2=Program_Name;

output;

_n_=h.removedup();

end;

else do;

       call missing (of q1-q3);

       output;

end;

run;

Regards,

Haikuo

RobertNYC
Obsidian | Level 7

Hi  Haikuo.

Man, thanks. You seem to me to be an absolutely genius.

I don’t know hash code, again (and only if you have time)could you please explain the above to me?  I really don’t understand most of it.  Some day I promise, I’ll answer a question ortwo for you. Thanks!

Haikuo
Onyx | Level 15

Robert,

It will be very difficult for me to explain the hash code to you in a nutshell if you have not done hash before. It has different rules and syntax, but it is absolutely NOT hard to learn, it has a lot less hiden tricks than data step has, most of them are pretty straitforward ( well, except mind twisting HOH technique).

Start from the recommendations of this thread, someone else has already asked questions for you:

https://communities.sas.com/message/46591#46591

I will start from introductory hash sugi paper, then move on to more practical application papers. If you have questions, you could always go to online help doc, also the best place you already know: here.

Good luck,

Haikuo

PGStats
Opal | Level 21

The only way to do this kind of merge with SQL is to add some observation IDs. Then it becomes simple :

data have1;
input (Agency_Name Program_Name) (:&$12.)    var1    var2    var3 ;
if Program_Name ne lag(Program_Name) or Agency_Name ne lag(Agency_Name) then id = 0;
id + 1;
cards;
Agency A    Program 1     1    1    2
Agency A    Program 1     1    1    1
Agency A    Program 1     1    1    2
Agency A    Program 1     1    1    1
Agency A    Program 1     1    1    2
Agency A    Program 1     1    1    2
Agency A    Program 2    1    1    1
Agency A    Program 2    1    1    2
Agency A    Program 2    1    1    2
Agency B    Program 2    1    1    2
Agency B    Program 2    1    1    1
Agency B    Program 3    1    1    1
Agency B    Program 3    1    1    2
Agency B    Program 3    1    1    2
Agency B    Program 3    1    1    2
Agency B    Program 3    1    1    2
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    1
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    1
Agency D    Program 1    1    1    2
Agency D    Program 1    1    1    2
Agency D    Program 1    1    1    2
Agency D    Program 1    1    1    2
Agency D    Program 1    1    1    2
Agency D    Program 2    1    1    2
Agency D    Program 2    1    1    2
Agency D    Program 2    1    1    2
Agency D    Program 2    1    1    2
Agency D    Program 2    1    1    2
;

data have2;
input (Agency_Name    Program_Name) (:&$12.)    Q1    Q2    Q3 ;
if Program_Name ne lag(Program_Name) or Agency_Name ne lag(Agency_Name) then id = 0;
id + 1;
cards;
Agency A    Program 1     1    1    2
Agency A    Program 1     1    1    1
Agency A    Program 2    1    1    2
Agency A    Program 2    1    1    1
Agency A    Program 2    1    1    2
Agency B    Program 2    1    1    2
Agency B    Program 2    1    1    1
Agency B    Program 3    1    1    2
Agency B    Program 3    1    1    2
Agency C    Program 3    1    1    2
Agency C    Program 3    1    1    1
Agency C    Program 3    1    1    1
Agency C    Program 3    1    1    2
Agency D    Program 1     1    1    2
Agency D    Program 2    1    1    2
Agency D    Program 2    1    1    2
;

proc sql;
create table want as
select h1.Agency_Name, h1.Program_Name, h1.var1, h1.var2, h1.var3,
h2.Agency_Name as Agency_Name2, h2.Program_Name as Program_Name2, h2.q1, h2.q2, h2.q3
from have1 as h1 natural left join have2 as h2;

PG

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 2061 views
  • 0 likes
  • 5 in conversation