Hi all--
I’ve been trying a lot of different combinations of proc sql join statements but I can’t get this to work. I have two datasets a need to join. I need to join Dataset 2 to Dataset 1 by Agency_Name and Program_name. The new data set needs to look a certain way.
Please see the data sets below.
Data set 1 | ||||
Agency_Name | Program_Name | var1 | var2 | var3 |
Agency A | Program 1 | 1 | 1 | 2 |
Agency A | Program 1 | 1 | 1 | 1 |
Agency A | Program 1 | 1 | 1 | 2 |
Agency A | Program 1 | 1 | 1 | 1 |
Agency A | Program 1 | 1 | 1 | 2 |
Agency A | Program 1 | 1 | 1 | 2 |
Agency A | Program 2 | 1 | 1 | 1 |
Agency A | Program 2 | 1 | 1 | 2 |
Agency A | Program 2 | 1 | 1 | 2 |
Agency B | Program 2 | 1 | 1 | 2 |
Agency B | Program 2 | 1 | 1 | 1 |
Agency B | Program 3 | 1 | 1 | 1 |
Agency B | Program 3 | 1 | 1 | 2 |
Agency B | Program 3 | 1 | 1 | 2 |
Agency B | Program 3 | 1 | 1 | 2 |
Agency B | Program 3 | 1 | 1 | 2 |
Agency C | Program 3 | 1 | 1 | 2 |
Agency C | Program 3 | 1 | 1 | 1 |
Agency C | Program 3 | 1 | 1 | 2 |
Agency C | Program 3 | 1 | 1 | 2 |
Agency C | Program 3 | 1 | 1 | 2 |
Agency C | Program 3 | 1 | 1 | 2 |
Agency C | Program 3 | 1 | 1 | 1 |
Agency D | Program 1 | 1 | 1 | 2 |
Agency D | Program 1 | 1 | 1 | 2 |
Agency D | Program 1 | 1 | 1 | 2 |
Agency D | Program 1 | 1 | 1 | 2 |
Agency D | Program 1 | 1 | 1 | 2 |
Agency D | Program 2 | 1 | 1 | 2 |
Agency D | Program 2 | 1 | 1 | 2 |
Agency D | Program 2 | 1 | 1 | 2 |
Agency D | Program 2 | 1 | 1 | 2 |
Agency D | Program 2 | 1 | 1 | 2 |
Data set 2 | ||||
Agency_Name | Program_Name | Q1 | Q2 | Q3 |
Agency A | Program 1 | 1 | 1 | 2 |
Agency A | Program 1 | 1 | 1 | 1 |
Agency A | Program 2 | 1 | 1 | 2 |
Agency A | Program 2 | 1 | 1 | 1 |
Agency A | Program 2 | 1 | 1 | 2 |
Agency B | Program 2 | 1 | 1 | 2 |
Agency B | Program 2 | 1 | 1 | 1 |
Agency B | Program 3 | 1 | 1 | 2 |
Agency B | Program 3 | 1 | 1 | 2 |
Agency C | Program 3 | 1 | 1 | 2 |
Agency C | Program 3 | 1 | 1 | 1 |
Agency C | Program 3 | 1 | 1 | 1 |
Agency C | Program 3 | 1 | 1 | 2 |
Agency D | Program 1 | 1 | 1 | 2 |
Agency D | Program 2 | 1 | 1 | 2 |
Agency D | Program 2 | 1 | 1 | 2 |
I need to join Dataset 2 to Dataset 1 by Agency_Name and Program_name. The final product should look like this:
NEW Data set | |||||||||
Agency_Name | Program_Name | var1 | var2 | var3 | Agency_Name | Program_Name | Q1 | Q2 | Q3 |
Agency A | Program 1 | 1 | 1 | 2 | Agency A | Program 1 | 1 | 1 | 2 |
Agency A | Program 1 | 1 | 1 | 1 | Agency A | Program 1 | 1 | 1 | 2 |
Agency A | Program 1 | 1 | 1 | 2 | . | . | . | . | . |
Agency A | Program 1 | 1 | 1 | 1 | . | . | . | . | . |
Agency A | Program 1 | 1 | 1 | 2 | . | . | . | . | . |
Agency A | Program 1 | 1 | 1 | 2 | . | . | . | . | . |
Agency A | Program 2 | 1 | 1 | 1 | Agency A | Program 2 | 1 | 1 | 1 |
Agency A | Program 2 | 1 | 1 | 2 | Agency A | Program 2 | 1 | 1 | 2 |
Agency A | Program 2 | 1 | 1 | 2 | Agency A | Program 2 | 1 | 1 | 1 |
Agency B | Program 2 | 1 | 1 | 2 | Agency B | Program 2 | 1 | 1 | 2 |
Agency B | Program 2 | 1 | 1 | 1 | Agency B | Program 2 | 1 | 1 | 1 |
Agency B | Program 3 | 1 | 1 | 1 | Agency B | Program 3 | 1 | 1 | 2 |
Agency B | Program 3 | 1 | 1 | 2 | Agency B | Program 3 | 1 | 1 | 2 |
Agency B | Program 3 | 1 | 1 | 2 | . | . | . | . | . |
Agency B | Program 3 | 1 | 1 | 2 | . | . | . | . | . |
Agency B | Program 3 | 1 | 1 | 2 | . | . | . | . | . |
Agency C | Program 3 | 1 | 1 | 2 | Agency C | Program 3 | 1 | 1 | 2 |
Agency C | Program 3 | 1 | 1 | 1 | Agency C | Program 3 | 1 | 1 | 1 |
Agency C | Program 3 | 1 | 1 | 2 | Agency C | Program 3 | 1 | 1 | 1 |
Agency C | Program 3 | 1 | 1 | 2 | Agency C | Program 3 | 1 | 1 | 2 |
Agency C | Program 3 | 1 | 1 | 2 | . | . | . | . | . |
Agency C | Program 3 | 1 | 1 | 2 | . | . | . | . | . |
Agency C | Program 3 | 1 | 1 | 1 | . | . | . | . | . |
Agency D | Program 1 | 1 | 1 | 2 | Agency D | Program 1 | 1 | 1 | 2 |
Agency D | Program 1 | 1 | 1 | 2 | . | . | . | . | . |
Agency D | Program 1 | 1 | 1 | 2 | . | . | . | . | . |
Agency D | Program 1 | 1 | 1 | 2 | . | . | . | . | . |
Agency D | Program 1 | 1 | 1 | 2 | . | . | . | . | . |
Agency D | Program 2 | 1 | 1 | 2 | Agency D | Program 2 | 1 | 1 | 2 |
Agency D | Program 2 | 1 | 1 | 2 | Agency D | Program 2 | 1 | 1 | 2 |
Agency D | Program 2 | 1 | 1 | 2 | . | . | . | . | . |
Agency D | Program 2 | 1 | 1 | 2 | . | . | . | . | . |
Agency D | Program 2 | 1 | 1 | 2 | . | . | . | . | . |
Any assistance is greatly appreciated! Thanks!
SQL seems mission impossible to me, while Merge can do it simple:
data have1;
input (Agency_Name Program_Name) (:&$12.) var1 var2 var3 ;
cards;
Agency A Program 1 1 1 2
Agency A Program 1 1 1 1
Agency A Program 1 1 1 2
Agency A Program 1 1 1 1
Agency A Program 1 1 1 2
Agency A Program 1 1 1 2
Agency A Program 2 1 1 1
Agency A Program 2 1 1 2
Agency A Program 2 1 1 2
Agency B Program 2 1 1 2
Agency B Program 2 1 1 1
Agency B Program 3 1 1 1
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 1
Agency C Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 1
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
;
data have2;
input (Agency_Name Program_Name) (:&$12.) Q1 Q2 Q3 ;
cards;
Agency A Program 1 1 1 2
Agency A Program 1 1 1 1
Agency A Program 2 1 1 2
Agency A Program 2 1 1 1
Agency A Program 2 1 1 2
Agency B Program 2 1 1 2
Agency B Program 2 1 1 1
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 1
Agency C Program 3 1 1 1
Agency C Program 3 1 1 2
Agency D Program 1 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
;
data want;
merge have1 have2;
by Agency_Name Program_Name;
Agency_Name2=ifc(q1=.,'',Agency_Name);
Program_Name2=ifc(q1=.,'',Program_Name);
output;
call missing (of q1-q3);
run;
proc print;run;
Regards,
Haikuo
Do you have to use sql join or merge is OK?
Hi, I've tried both sql join and merge and I can't figure it out. So, yes either is fine.
SQL seems mission impossible to me, while Merge can do it simple:
data have1;
input (Agency_Name Program_Name) (:&$12.) var1 var2 var3 ;
cards;
Agency A Program 1 1 1 2
Agency A Program 1 1 1 1
Agency A Program 1 1 1 2
Agency A Program 1 1 1 1
Agency A Program 1 1 1 2
Agency A Program 1 1 1 2
Agency A Program 2 1 1 1
Agency A Program 2 1 1 2
Agency A Program 2 1 1 2
Agency B Program 2 1 1 2
Agency B Program 2 1 1 1
Agency B Program 3 1 1 1
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 1
Agency C Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 1
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
;
data have2;
input (Agency_Name Program_Name) (:&$12.) Q1 Q2 Q3 ;
cards;
Agency A Program 1 1 1 2
Agency A Program 1 1 1 1
Agency A Program 2 1 1 2
Agency A Program 2 1 1 1
Agency A Program 2 1 1 2
Agency B Program 2 1 1 2
Agency B Program 2 1 1 1
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 1
Agency C Program 3 1 1 1
Agency C Program 3 1 1 2
Agency D Program 1 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
;
data want;
merge have1 have2;
by Agency_Name Program_Name;
Agency_Name2=ifc(q1=.,'',Agency_Name);
Program_Name2=ifc(q1=.,'',Program_Name);
output;
call missing (of q1-q3);
run;
proc print;run;
Regards,
Haikuo
Thanks Hikuo. Would you be so kind though and explain to me what you did? I'm still a SAS newbie. The staments in bold I've never used before. Thanks
data want;
merge have1 have2;
by Agency_Name Program_Name;
Agency_Name2=ifc(q1=.,'',Agency_Name);
Program_Name2=ifc(q1=.,'',Program_Name);
output;
call missing (of q1-q3);
run;
and this
Input (Agency_Name Program_Name) (:&$12.) var1 var2 var3 ;
Input (Agency_Name Program_Name) (:&$12.) is short for:
Input Agency_Name :&$12. Program_Name :&$12.
& means you are reading values containing sigle blanks,
$ means you are reading character variables,
12. is the length.
: let you stop reading if there are more than one delimeters, blanks in this case.
Good night!
Haikuo
The values from datasets will automatically be retained by default, so all you need to do is to assign missing values to the shorter dataset after exhausting their matches, that is where call missing came to play. And because you need another two identifying variables ( Agency_Name2, Program_Name2), they are matching to q1-q3 by using ifc(), which is just a conditional function, a shorter version of if -then:
Agency_Name2=ifc(q1=.,'',Agency_Name); is equavalent to:
if q1=. then Agency_Name2='';
else Agency_Name2=Agency_Name;
Check help doc for details.
Regards,
Haikuo
HaiKuo.
I am curious that you are still student ? or a worked man ?
You have so much time to response so many questions.
In my imagination, You are from Singapore or Malaysia ?
Ksharp
Wherever Haiko is from SAS newbies must be pretty grateful.
Ksharp, you’ve answered a couple questions for me too (really well),which helped out in some tricky situations and helped me learn this stuff. Thanks!
you are welcome.
My job is very easy. So I could have lots of time to response some questions.
Ksharp
Here is a hash version:
data want;
if _n_=1 then do;
set have2 point=_n_;
dcl hash h(dataset:'have2', multidata:'y');
h.definekey('Agency_Name','Program_Name');
h.definedata(all:'y');
h.definedone();
end;
set have1;
_n_=h.find();
if _n_=0 then
do;
Agency_Name2=Agency_Name;
Program_Name2=Program_Name;
output;
_n_=h.removedup();
end;
else do;
call missing (of q1-q3);
output;
end;
run;
Regards,
Haikuo
Hi Haikuo.
Man, thanks. You seem to me to be an absolutely genius.
I don’t know hash code, again (and only if you have time)could you please explain the above to me? I really don’t understand most of it. Some day I promise, I’ll answer a question ortwo for you. Thanks!
Robert,
It will be very difficult for me to explain the hash code to you in a nutshell if you have not done hash before. It has different rules and syntax, but it is absolutely NOT hard to learn, it has a lot less hiden tricks than data step has, most of them are pretty straitforward ( well, except mind twisting HOH technique).
Start from the recommendations of this thread, someone else has already asked questions for you:
https://communities.sas.com/message/46591#46591
I will start from introductory hash sugi paper, then move on to more practical application papers. If you have questions, you could always go to online help doc, also the best place you already know: here.
Good luck,
Haikuo
The only way to do this kind of merge with SQL is to add some observation IDs. Then it becomes simple :
data have1;
input (Agency_Name Program_Name) (:&$12.) var1 var2 var3 ;
if Program_Name ne lag(Program_Name) or Agency_Name ne lag(Agency_Name) then id = 0;
id + 1;
cards;
Agency A Program 1 1 1 2
Agency A Program 1 1 1 1
Agency A Program 1 1 1 2
Agency A Program 1 1 1 1
Agency A Program 1 1 1 2
Agency A Program 1 1 1 2
Agency A Program 2 1 1 1
Agency A Program 2 1 1 2
Agency A Program 2 1 1 2
Agency B Program 2 1 1 2
Agency B Program 2 1 1 1
Agency B Program 3 1 1 1
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 1
Agency C Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 1
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 1 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
;
data have2;
input (Agency_Name Program_Name) (:&$12.) Q1 Q2 Q3 ;
if Program_Name ne lag(Program_Name) or Agency_Name ne lag(Agency_Name) then id = 0;
id + 1;
cards;
Agency A Program 1 1 1 2
Agency A Program 1 1 1 1
Agency A Program 2 1 1 2
Agency A Program 2 1 1 1
Agency A Program 2 1 1 2
Agency B Program 2 1 1 2
Agency B Program 2 1 1 1
Agency B Program 3 1 1 2
Agency B Program 3 1 1 2
Agency C Program 3 1 1 2
Agency C Program 3 1 1 1
Agency C Program 3 1 1 1
Agency C Program 3 1 1 2
Agency D Program 1 1 1 2
Agency D Program 2 1 1 2
Agency D Program 2 1 1 2
;
proc sql;
create table want as
select h1.Agency_Name, h1.Program_Name, h1.var1, h1.var2, h1.var3,
h2.Agency_Name as Agency_Name2, h2.Program_Name as Program_Name2, h2.q1, h2.q2, h2.q3
from have1 as h1 natural left join have2 as h2;
PG
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.