BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
singhsahab
Lapis Lazuli | Level 10

Dear All,

 

I've tow dataset, dataset1 (contains ID & Name) & dataset2 (contains Id1,Id2 & Id3) ,values of dataset2 variable coming from dataset1 variables (ID). later i want to join them in a single DATA step. Can you please help me how i can perform SAS merging to get my final output metion in FinalDataset dataset.  In real time datsets contains more then 10000+ records. 

 

DATASET1:

ID NAME

1 ABC
2 CDF
3 KLM
4 JKL


DATASET2:

ID1 ID2 ID3 
1 2 3
3 . 4


AFTER MERGING DATSET1 & DATASET2 THE FINAL DATASET WOULD BE:

FinalDataset:

ID1 ID2 ID3 NAME 
1 2 3 ABC,CDF,KLM
3 . 4 KLM,JKL

 

 

Thank you !!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Expand the format with a value for "Other" and check for that before concatenating:

data dataset3;
input id site name $;
cards;
1 22 ABC
2 22 CDF
3 22 KLM
4 34 JKL
;
run;

data dataset4;
input id1 id2 id3 site;
cards;
1 2 3 22
3 . 4 34
4 . . 34
;
run;

data test (keep=id name);
set dataset3;
id=strip(id)||strip(site);
run;

data test1 (keep=id1 id2 id3);
set dataset4;
if id1 ne . and site ne . then id1=strip(id1)||strip(site);
if id2 ne . and site ne . then id2=strip(id2)||strip(site);
if id3 ne . and site ne . then id3=strip(id3)||strip(site);
run;

data cntlin2;
set test (rename=(id=start name=label)) end=eof ;
fmtname='idnamee';
type='N';
output;
if eof
then do;
  start = .;
  label = '***';
  hlo = 'O';
  output;
end;
run;

proc sort data=cntlin2 nodupkey;
by start;
run;

proc format cntlin=cntlin2;
run;

data want1;
set test1;
array ids{*} id:;
length name $20;
do i = 1 to dim(ids);
if not missing(ids{i}) and put(ids{i},idnamee.) ne '***'
then name=catx(',',name,put(ids{i},idnamee.));
end;
drop i;
run;

View solution in original post

17 REPLIES 17
novinosrin
Tourmaline | Level 20

A very simple hash find()=0 pull 

load dtset1 in hash table 

loop through array id1-id3

use each of those array elements as keys & pull the names with find method 

 

 

Kurt_Bremser
Super User

10000 can be handled with a format:

data dataset1;
input id name $;
cards;
1 ABC
2 CDF
3 KLM
4 JKL
;
run;

data dataset2;
input id1 id2 id3;
cards;
1 2 3
3 . 4
;
run;

data cntlin;
set dataset1;
rename id=start name=label;
fmtname='idname';
type='N';
run;

proc format cntlin=cntlin;
run;

data want;
set dataset2;
array ids{*} id:;
length name $20;
do i = 1 to dim(ids);
  if not missing(ids{i}) then name=catx(',',name,put(ids{i},idname.));
end;
drop i;
run;
singhsahab
Lapis Lazuli | Level 10

Dear sir , 

Thank you so much for letting me know this approach of merging. 

I tried same approach for more then one variable , but getting unexpected output. Below is my code.  

 

data dataset3;
input id site name $;
cards;
1 22 ABC
2 22 CDF
3 22 KLM
4 34 JKL
;
run;

data dataset4;
input id1 id2 id3 site;
cards;
1 2 3 22
3 . 4 34
4 . . 34
;
run;

data test (keep=id name);
set dataset3;
id=strip(id)||strip(site);
run;

data test1 (keep=id1 id2 id3);
set dataset4;
if id1 ne . and site ne . then id1=strip(id1)||strip(site);
if id2 ne . and site ne . then id2=strip(id2)||strip(site);
if id3 ne . and site ne . then id3=strip(id3)||strip(site);
run;

data cntlin2;
set test;
rename id=start name=label;
fmtname='idnamee';
type='N';
run;

proc sort data=cntlin2 nodupkey;
by start;
run;

proc format cntlin=cntlin2;
run;

data want1;
set test1;
array ids{*} id:;
length name $20;
do i = 1 to dim(ids);
if not missing(ids{i}) then name=catx(',',name,put(ids{i},idnamee.));
end;
drop i;
run;

 

 

Final Output :

 

 OBS
 
id1
 
 id2
 
 id3
 
 name
 
 
1122222322ABC,CDF,KLM 
2334.434334,JKL 
3434..JKL

 

Once again Thank you !!

singhsahab
Lapis Lazuli | Level 10

Dear Sir,

 

Can you please elaborate , in this case how to get only matching record.

 

Many Thanks...

Kurt_Bremser
Super User

Expand the format with a value for "Other" and check for that before concatenating:

data dataset3;
input id site name $;
cards;
1 22 ABC
2 22 CDF
3 22 KLM
4 34 JKL
;
run;

data dataset4;
input id1 id2 id3 site;
cards;
1 2 3 22
3 . 4 34
4 . . 34
;
run;

data test (keep=id name);
set dataset3;
id=strip(id)||strip(site);
run;

data test1 (keep=id1 id2 id3);
set dataset4;
if id1 ne . and site ne . then id1=strip(id1)||strip(site);
if id2 ne . and site ne . then id2=strip(id2)||strip(site);
if id3 ne . and site ne . then id3=strip(id3)||strip(site);
run;

data cntlin2;
set test (rename=(id=start name=label)) end=eof ;
fmtname='idnamee';
type='N';
output;
if eof
then do;
  start = .;
  label = '***';
  hlo = 'O';
  output;
end;
run;

proc sort data=cntlin2 nodupkey;
by start;
run;

proc format cntlin=cntlin2;
run;

data want1;
set test1;
array ids{*} id:;
length name $20;
do i = 1 to dim(ids);
if not missing(ids{i}) and put(ids{i},idnamee.) ne '***'
then name=catx(',',name,put(ids{i},idnamee.));
end;
drop i;
run;
novinosrin
Tourmaline | Level 20

To illustrate what i said last night:

 

data dataset1;
input id name $;
cards;
1 ABC
2 CDF
3 KLM
4 JKL
;
run;

data dataset2;
input id1 id2 id3;
cards;
1 2 3
3 . 4
;
run;

data want(drop=id name);
if 0 then set dataset2  dataset1;
dcl hash h(dataset:'dataset1',ordered:'y');
h.definekey('id');
h.definedata('name');
h.definedone();
do until(lr);
set dataset2 end=lr;
array t id1-id3;
length new_name $20;
call missing(new_name);
do over t;
_iorc_=h.find(key:t);
if _iorc_=0 then new_name=catx(',',new_name,name);
end;
output;
end;
stop;
run;
mkeintz
PROC Star

@novinosrin

 

Just a nit.  You don't need the "ordered:'a'" parameter for this code, since the hash object in not to be output.  You only need the hash for lookup - adding the ordered attribute will slow down building the hash and retrieving from it.  Compare the below with vs without the ordered attribute:

 

data _null_;
  declare hash h (ordered:'a');
    h.definekey('x');
	h.definedata('x','y');
	h.definedone();

  do x=1 to 10000000;y=-1*x;h.add();end;
  /*do x=1 to 10000000;h.find();end;*/
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

Thank you Sir @mkeintz.  I want to take advantage of you to allow me a major request at a truly convenient time of yours. i know you may be busy with wharton, family and self etc. However, when and if time, heart, mind and soul permits, may i request you to pen down how you often efficiently

1. use self merge datastep

2. your classy wayof using lag on a condition

3. interleave akin to 1

 

If you do and when you do and supposing that you may use some well simulated use cases, kindly plug me plz. 

My mom(retired forex analyst and doesn't have an account here) often reads your thread with greater attention to detail. So, If it's not a bother, I wish you may.Thank you!

 

 PS If that happens, our library is probably the best place to post for the benefit of wider audience

 

singhsahab
Lapis Lazuli | Level 10

@novinosrin : Hello Sir,  

 

First of all , i'm thank full to you to share the new concept of data merge specially for me . Today i get a chance to learn hash concept and got stuck on solving the same question by adding one more matching variable (or more then one matching variable) . Here i'm not sure how i'll use new variable with array as key. Below is the sample dataset, where i added one more matching variable site and trying to getting same output as pasted below. 

 

data dataset1;
input id site name $;
cards;
1 22 ABC
2 22 CDF
3 22 KLM
4 34 JKL
;
run;

data dataset2;
input id1 id2 id3 site;

cards;
1 2 3 22
3 . 4 34
4 . . 34
;
run;

 

 

Capture.PNG

 

 

novinosrin
Tourmaline | Level 20

Haha Plz don;t call me Sir. I haven't earned it/nor I deserve it.

To your question, give me sometime as I am away from SAS software now. Will respond soon as i can

novinosrin
Tourmaline | Level 20

@singhsahab Ok, adding site to the data will not produce any changes to your result because the key in the hash table is ID and data value id name. So the look up is basically from each element in your array i.e id1-id3 to id in hash table to get the name value. This is the existing logic.

 

You need to explain "Here i'm not sure how i'll use new variable with array as key. " in terms of what do you want to use as look up keys to fetch what as you have added site variable in both datasets.

 

Are you looking to use composite hash keys using id+site? Your requirement is not quite clear

singhsahab
Lapis Lazuli | Level 10

@novinosrin : Yes,i,m looking to use composite hash keys using id+site to join dataset. I define site as a key but getting error.  I'm not sure how i will use composite key to join.

 

 

Capture.PNG

 

 

 

novinosrin
Tourmaline | Level 20
data dataset1;
input id site name $;
cards;
1 22 ABC
2 22 CDF
3 22 KLM
4 34 JKL
;
run;

data dataset2;
input id1 id2 id3 site;

cards;
1 2 3 22
3 . 4 34
4 . . 34
;
run;


data want(drop=id name);
if 0 then set dataset2  dataset1;
dcl hash h(dataset:'dataset1');
h.definekey('id','site');
h.definedata('name');
h.definedone();
do until(lr);
set dataset2 end=lr;
array t id1-id3;
length new_name $20;
call missing(new_name);
do over t;
_iorc_=h.find(key:t,key:site);
if _iorc_=0 then new_name=catx(',',new_name,name);
end;
output;
end;
stop;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 1814 views
  • 4 likes
  • 4 in conversation