Hello,
I am trying to use Hash Object to create a dataset Want.
I want to be able to join datasets work.Small and SAS.BIG on ID.
Getting an error at Code Line
do unitil (eof); /* Error Message : Expecting an =. */
Need help in identifying what is wrong with the code.
I am sorry but the database is quite big and I won't be able to provide details using INFILE option.
The same do until code works in first step but it doesn't work when I use dataset from SAS library in second do until step.
Is it possible to identify the error with no data information ?
Here's the code I am using:
data Want;
declare hash Sub(hashexp:7);
Sub.definekey('ID');
Sub.definedata('n');
Sub.definedone();
do until (eof_Small);
set Small end=eof_Small
n+1;
Sub.add();
end;
do unitl (eof); /*Getting an error here that : Expecting an =. */
set SAS.BIG end=eof ;
if Sub.find()=0 then do;
set Small point=n;
output;
end;
stop;
data small;
input (Source New_ID ID Product Make fou4th) (:$10.) (Begin_Date End_Date) (:yymmdd10.) In1 In2;
format Begin_Date End_Date yymmdd10.;
cards;
DB1 123 23456 Car Honda Y 20170101 20171115 6 6
DB1 124 23457 Car Toyota N 20161101 20171115 11 11
DB1 124 23458 Car Acura N 20170101 20171115 7 7
DB1 125 23459 Truck Lexus N 20170101 20171115 5 5
DB1 125 23460 Jeep Jeep N 20170101 20171115 6 6
DB1 126 23461 Car Honda N 20170401 20171115 6 6
DB1 126 23462 Car Toyota N 20170101 20171115 2 2
DB1 127 23463 Truck Acura N 20170501 20171115 12 12
DB1 127 23464 Jeep Lexus N 20170501 20171115 12 12
DB1 128 23465 Car Jeep N 20160801 20171115 1 1
;
data big;
input ID $ Quarter (BGN_DT END_DT) (:date9.) (First_Name Last_Name MEME_SFX Gender Age Category Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Cat8) (:$15.);
format BGN_DT END_DT date9.;
cards;
23456 1 1-Dec-17 30-Nov-18 Ross Geller 4 F 19 Female,19-20 A A A A A 0 0 12
23457 1 1-Dec-17 30-Nov-18 Monica Geller 1 F 49 Female,50-54 A A A A A 0 0 11
23458 0 1-Feb-18 31-Jan-19 Chandler Bing 0 M 49 Male,50-54 A A A A A 0 0 2
23459 1 1-Dec-17 30-Nov-18 Phoebe Buffay 0 F 43 Female,40-44 A A A A A 0 0 1
23460 1 1-Dec-17 30-Nov-18 Joe Tribianni 1 M 57 Male,55-59 A A A A A 0 0 12
23461 1 1-Dec-17 30-Nov-18 Pam Beesly 1 M 37 Male,35-39 A A A A A 0 0 12
23462 2 1-Dec-16 30-Nov-17 Jim Helpert 0 M 40 Male,40-44 A A A A A 0 0 12
23463 1 1-Dec-17 30-Nov-18 Michael Scott 0 F 62 Female,60-64 A A A A A 0 0 12
23464 1 1-Dec-17 30-Nov-18 Dwight Schrute 0 F 22 Female,21-24 A A A A A 0 0 1
23465 2 1-Dec-16 30-Nov-17 Creed Braxton 3 M 54 Male,55-59 A A A A A 0 0 12
;
data want ;
if _n_=1 then do;
if 0 then set small;
dcl hash H (dataset:'small',multidata:'y') ;
h.definekey ("id") ;
h.definedata ("source", "new_id", "product") ;
h.definedone () ;
end;
set big;
/*assuming you want only the matches aka inner join*/
if h.find()=0;
keep Quarter First_Name Last_Name Gender Age Source New_ID ID Product;
run;
or
you can add use keep=dataset option to process only the wanted vars in the PDV
set big(keep=Quarter First_Name Last_Name Gender age);
Hi @VarunD Can you please make it easy for us, hmm rather me to attempt to help you.
1. You could make a small mock sample data in an excel with like 10 rows
2. You could make a big mock sample data in an excel with like 15 rows
3. Explain what you want to accomplish by joining i.e your expected output
Paste your sample here. That's all you need to do here. Plz
Basically post your BIG, SMALL & RESULT samples with an explanation.
@VarunD wrote:
Hello,
I am sorry but the database is quite big and I won't be able to provide details using INFILE option.
Please find the spreadsheet with mock data.
Sorry about not posting it earlier as I was thinking this has something to do with the syntax since I am using hash object for the first time.
I simply want to match merge both the datasets on ID.
I tried using PROC SQL and left join to merge small and big but I was having performance issues.
So, I want all the data from small and only matching observations from BIG on ID.
Hope that information helps.
Thanks for looking into this.
Thanks. So do you want to fetch the records associated with matching ids in small from big, only matches? or in other words what vars do you want to pull. I can't see the expected output to make out what you want to accomplish
Sorry about that.
Yes, all observations from Small and only matches from BIG.
I want Source New_ID ID and Product from Small and
Quarter First_Name Last_Name Gender and Age from BIG.
data small;
input (Source New_ID ID Product Make fou4th) (:$10.) (Begin_Date End_Date) (:yymmdd10.) In1 In2;
format Begin_Date End_Date yymmdd10.;
cards;
DB1 123 23456 Car Honda Y 20170101 20171115 6 6
DB1 124 23457 Car Toyota N 20161101 20171115 11 11
DB1 124 23458 Car Acura N 20170101 20171115 7 7
DB1 125 23459 Truck Lexus N 20170101 20171115 5 5
DB1 125 23460 Jeep Jeep N 20170101 20171115 6 6
DB1 126 23461 Car Honda N 20170401 20171115 6 6
DB1 126 23462 Car Toyota N 20170101 20171115 2 2
DB1 127 23463 Truck Acura N 20170501 20171115 12 12
DB1 127 23464 Jeep Lexus N 20170501 20171115 12 12
DB1 128 23465 Car Jeep N 20160801 20171115 1 1
;
data big;
input ID $ Quarter (BGN_DT END_DT) (:date9.) (First_Name Last_Name MEME_SFX Gender Age Category Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Cat8) (:$15.);
format BGN_DT END_DT date9.;
cards;
23456 1 1-Dec-17 30-Nov-18 Ross Geller 4 F 19 Female,19-20 A A A A A 0 0 12
23457 1 1-Dec-17 30-Nov-18 Monica Geller 1 F 49 Female,50-54 A A A A A 0 0 11
23458 0 1-Feb-18 31-Jan-19 Chandler Bing 0 M 49 Male,50-54 A A A A A 0 0 2
23459 1 1-Dec-17 30-Nov-18 Phoebe Buffay 0 F 43 Female,40-44 A A A A A 0 0 1
23460 1 1-Dec-17 30-Nov-18 Joe Tribianni 1 M 57 Male,55-59 A A A A A 0 0 12
23461 1 1-Dec-17 30-Nov-18 Pam Beesly 1 M 37 Male,35-39 A A A A A 0 0 12
23462 2 1-Dec-16 30-Nov-17 Jim Helpert 0 M 40 Male,40-44 A A A A A 0 0 12
23463 1 1-Dec-17 30-Nov-18 Michael Scott 0 F 62 Female,60-64 A A A A A 0 0 12
23464 1 1-Dec-17 30-Nov-18 Dwight Schrute 0 F 22 Female,21-24 A A A A A 0 0 1
23465 2 1-Dec-16 30-Nov-17 Creed Braxton 3 M 54 Male,55-59 A A A A A 0 0 12
;
data want ;
if _n_=1 then do;
if 0 then set small;
dcl hash H (dataset:'small',multidata:'y') ;
h.definekey ("id") ;
h.definedata ("source", "new_id", "product") ;
h.definedone () ;
end;
set big;
/*assuming you want only the matches aka inner join*/
if h.find()=0;
keep Quarter First_Name Last_Name Gender Age Source New_ID ID Product;
run;
or
you can add use keep=dataset option to process only the wanted vars in the PDV
set big(keep=Quarter First_Name Last_Name Gender age);
Yes, I was silly to add that as your id's are unique. Good catch mate! My apologies
I copy pasted syntax from online search and forgot to remove that piece from the syntax. Sincere apologies again!
That was one of the issue with Code. But, hey In my defense it was Friday evening : )
I fixed it and got other errors that I was able to resolve but it was still slow.
I found the code novinosrin provided was better, in that it is simple for someone like me to understand and modify it for re-use.
Thanks for your help.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.