Dear All,
I've tow dataset, dataset1 (contains ID & Name) & dataset2 (contains Id1,Id2 & Id3) ,values of dataset2 variable coming from dataset1 variables (ID). later i want to join them in a single DATA step. Can you please help me how i can perform SAS merging to get my final output metion in FinalDataset dataset. In real time datsets contains more then 10000+ records.
DATASET1:
ID NAME
1 ABC
2 CDF
3 KLM
4 JKL
DATASET2:
ID1 ID2 ID3
1 2 3
3 . 4
AFTER MERGING DATSET1 & DATASET2 THE FINAL DATASET WOULD BE:
FinalDataset:
ID1 ID2 ID3 NAME
1 2 3 ABC,CDF,KLM
3 . 4 KLM,JKL
Thank you !!
Expand the format with a value for "Other" and check for that before concatenating:
data dataset3;
input id site name $;
cards;
1 22 ABC
2 22 CDF
3 22 KLM
4 34 JKL
;
run;
data dataset4;
input id1 id2 id3 site;
cards;
1 2 3 22
3 . 4 34
4 . . 34
;
run;
data test (keep=id name);
set dataset3;
id=strip(id)||strip(site);
run;
data test1 (keep=id1 id2 id3);
set dataset4;
if id1 ne . and site ne . then id1=strip(id1)||strip(site);
if id2 ne . and site ne . then id2=strip(id2)||strip(site);
if id3 ne . and site ne . then id3=strip(id3)||strip(site);
run;
data cntlin2;
set test (rename=(id=start name=label)) end=eof ;
fmtname='idnamee';
type='N';
output;
if eof
then do;
start = .;
label = '***';
hlo = 'O';
output;
end;
run;
proc sort data=cntlin2 nodupkey;
by start;
run;
proc format cntlin=cntlin2;
run;
data want1;
set test1;
array ids{*} id:;
length name $20;
do i = 1 to dim(ids);
if not missing(ids{i}) and put(ids{i},idnamee.) ne '***'
then name=catx(',',name,put(ids{i},idnamee.));
end;
drop i;
run;
A very simple hash find()=0 pull
load dtset1 in hash table
loop through array id1-id3
use each of those array elements as keys & pull the names with find method
10000 can be handled with a format:
data dataset1;
input id name $;
cards;
1 ABC
2 CDF
3 KLM
4 JKL
;
run;
data dataset2;
input id1 id2 id3;
cards;
1 2 3
3 . 4
;
run;
data cntlin;
set dataset1;
rename id=start name=label;
fmtname='idname';
type='N';
run;
proc format cntlin=cntlin;
run;
data want;
set dataset2;
array ids{*} id:;
length name $20;
do i = 1 to dim(ids);
if not missing(ids{i}) then name=catx(',',name,put(ids{i},idname.));
end;
drop i;
run;
Dear sir ,
Thank you so much for letting me know this approach of merging.
I tried same approach for more then one variable , but getting unexpected output. Below is my code.
data dataset3;
input id site name $;
cards;
1 22 ABC
2 22 CDF
3 22 KLM
4 34 JKL
;
run;
data dataset4;
input id1 id2 id3 site;
cards;
1 2 3 22
3 . 4 34
4 . . 34
;
run;
data test (keep=id name);
set dataset3;
id=strip(id)||strip(site);
run;
data test1 (keep=id1 id2 id3);
set dataset4;
if id1 ne . and site ne . then id1=strip(id1)||strip(site);
if id2 ne . and site ne . then id2=strip(id2)||strip(site);
if id3 ne . and site ne . then id3=strip(id3)||strip(site);
run;
data cntlin2;
set test;
rename id=start name=label;
fmtname='idnamee';
type='N';
run;
proc sort data=cntlin2 nodupkey;
by start;
run;
proc format cntlin=cntlin2;
run;
data want1;
set test1;
array ids{*} id:;
length name $20;
do i = 1 to dim(ids);
if not missing(ids{i}) then name=catx(',',name,put(ids{i},idnamee.));
end;
drop i;
run;
Final Output :
1 | 122 | 222 | 322 | ABC,CDF,KLM |
2 | 334 | . | 434 | 334,JKL |
3 | 434 | . | . | JKL |
Once again Thank you !!
You have an ID in test1/id1 that has no match in test (334), so the format will display the value as-is.
Dear Sir,
Can you please elaborate , in this case how to get only matching record.
Many Thanks...
Expand the format with a value for "Other" and check for that before concatenating:
data dataset3;
input id site name $;
cards;
1 22 ABC
2 22 CDF
3 22 KLM
4 34 JKL
;
run;
data dataset4;
input id1 id2 id3 site;
cards;
1 2 3 22
3 . 4 34
4 . . 34
;
run;
data test (keep=id name);
set dataset3;
id=strip(id)||strip(site);
run;
data test1 (keep=id1 id2 id3);
set dataset4;
if id1 ne . and site ne . then id1=strip(id1)||strip(site);
if id2 ne . and site ne . then id2=strip(id2)||strip(site);
if id3 ne . and site ne . then id3=strip(id3)||strip(site);
run;
data cntlin2;
set test (rename=(id=start name=label)) end=eof ;
fmtname='idnamee';
type='N';
output;
if eof
then do;
start = .;
label = '***';
hlo = 'O';
output;
end;
run;
proc sort data=cntlin2 nodupkey;
by start;
run;
proc format cntlin=cntlin2;
run;
data want1;
set test1;
array ids{*} id:;
length name $20;
do i = 1 to dim(ids);
if not missing(ids{i}) and put(ids{i},idnamee.) ne '***'
then name=catx(',',name,put(ids{i},idnamee.));
end;
drop i;
run;
To illustrate what i said last night:
data dataset1;
input id name $;
cards;
1 ABC
2 CDF
3 KLM
4 JKL
;
run;
data dataset2;
input id1 id2 id3;
cards;
1 2 3
3 . 4
;
run;
data want(drop=id name);
if 0 then set dataset2 dataset1;
dcl hash h(dataset:'dataset1',ordered:'y');
h.definekey('id');
h.definedata('name');
h.definedone();
do until(lr);
set dataset2 end=lr;
array t id1-id3;
length new_name $20;
call missing(new_name);
do over t;
_iorc_=h.find(key:t);
if _iorc_=0 then new_name=catx(',',new_name,name);
end;
output;
end;
stop;
run;
Just a nit. You don't need the "ordered:'a'" parameter for this code, since the hash object in not to be output. You only need the hash for lookup - adding the ordered attribute will slow down building the hash and retrieving from it. Compare the below with vs without the ordered attribute:
data _null_;
declare hash h (ordered:'a');
h.definekey('x');
h.definedata('x','y');
h.definedone();
do x=1 to 10000000;y=-1*x;h.add();end;
/*do x=1 to 10000000;h.find();end;*/
run;
Thank you Sir @mkeintz. I want to take advantage of you to allow me a major request at a truly convenient time of yours. i know you may be busy with wharton, family and self etc. However, when and if time, heart, mind and soul permits, may i request you to pen down how you often efficiently
1. use self merge datastep
2. your classy wayof using lag on a condition
3. interleave akin to 1
If you do and when you do and supposing that you may use some well simulated use cases, kindly plug me plz.
My mom(retired forex analyst and doesn't have an account here) often reads your thread with greater attention to detail. So, If it's not a bother, I wish you may.Thank you!
PS If that happens, our library is probably the best place to post for the benefit of wider audience
@novinosrin : Hello Sir,
First of all , i'm thank full to you to share the new concept of data merge specially for me . Today i get a chance to learn hash concept and got stuck on solving the same question by adding one more matching variable (or more then one matching variable) . Here i'm not sure how i'll use new variable with array as key. Below is the sample dataset, where i added one more matching variable site and trying to getting same output as pasted below.
data dataset1;
input id site name $;
cards;
1 22 ABC
2 22 CDF
3 22 KLM
4 34 JKL
;
run;
data dataset2;
input id1 id2 id3 site;
cards;
1 2 3 22
3 . 4 34
4 . . 34
;
run;
Haha Plz don;t call me Sir. I haven't earned it/nor I deserve it.
To your question, give me sometime as I am away from SAS software now. Will respond soon as i can
@singhsahab Ok, adding site to the data will not produce any changes to your result because the key in the hash table is ID and data value id name. So the look up is basically from each element in your array i.e id1-id3 to id in hash table to get the name value. This is the existing logic.
You need to explain "Here i'm not sure how i'll use new variable with array as key. " in terms of what do you want to use as look up keys to fetch what as you have added site variable in both datasets.
Are you looking to use composite hash keys using id+site? Your requirement is not quite clear
@novinosrin : Yes,i,m looking to use composite hash keys using id+site to join dataset. I define site as a key but getting error. I'm not sure how i will use composite key to join.
data dataset1;
input id site name $;
cards;
1 22 ABC
2 22 CDF
3 22 KLM
4 34 JKL
;
run;
data dataset2;
input id1 id2 id3 site;
cards;
1 2 3 22
3 . 4 34
4 . . 34
;
run;
data want(drop=id name);
if 0 then set dataset2 dataset1;
dcl hash h(dataset:'dataset1');
h.definekey('id','site');
h.definedata('name');
h.definedone();
do until(lr);
set dataset2 end=lr;
array t id1-id3;
length new_name $20;
call missing(new_name);
do over t;
_iorc_=h.find(key:t,key:site);
if _iorc_=0 then new_name=catx(',',new_name,name);
end;
output;
end;
stop;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.