BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
VarunD
Obsidian | Level 7

Hello,

 

I am trying to use Hash Object to create a dataset Want.

I want to be able to join datasets work.Small and SAS.BIG on ID. 

 

Getting an error at Code Line

do unitil (eof);                     /* Error Message : Expecting an =. */

 

Need help in identifying what is wrong with the code.

I am sorry but the database is quite big and I won't be able to provide details using INFILE option.

 

The same do until code works in first step but it doesn't work when I use dataset from SAS library in second do until step.

Is it possible to identify the error with no data information ?

 

Here's the code I am using:

 

 

 

data Want;

declare hash Sub(hashexp:7);
Sub.definekey('ID');
Sub.definedata('n');
Sub.definedone();

do until (eof_Small);
set Small end=eof_Small
n+1;
Sub.add();
end;

 

do unitl (eof);                                                                                                     /*Getting an error here that : Expecting an =. */
set SAS.BIG end=eof ;

 

if Sub.find()=0 then do;
set Small point=n;
output;
end;

stop;

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

data small;
input (Source	New_ID	ID	Product	Make	fou4th) (:$10.)	(Begin_Date	End_Date) (:yymmdd10.)	In1	In2;
format Begin_Date	End_Date yymmdd10.;
cards;
DB1	123	23456	Car	Honda	Y	20170101	20171115	6	6
DB1	124	23457	Car	Toyota	N	20161101	20171115	11	11
DB1	124	23458	Car	Acura	N	20170101	20171115	7	7
DB1	125	23459	Truck	Lexus	N	20170101	20171115	5	5
DB1	125	23460	Jeep	Jeep	N	20170101	20171115	6	6
DB1	126	23461	Car	Honda	N	20170401	20171115	6	6
DB1	126	23462	Car	Toyota	N	20170101	20171115	2	2
DB1	127	23463	Truck	Acura	N	20170501	20171115	12	12
DB1	127	23464	Jeep	Lexus	N	20170501	20171115	12	12
DB1	128	23465	Car	Jeep	N	20160801	20171115	1	1
;

data big;
input ID $	Quarter	(BGN_DT	END_DT) (:date9.)	(First_Name	Last_Name	MEME_SFX	Gender	Age	Category 	Cat1	Cat2	Cat3	Cat4	Cat5	Cat6	Cat7	Cat8) (:$15.);
format BGN_DT	END_DT date9.;
cards;
23456	1	1-Dec-17	30-Nov-18	Ross	Geller	4	F	19	Female,19-20	A	A	A	A	A	0	0	12
23457	1	1-Dec-17	30-Nov-18	Monica	Geller	1	F	49	Female,50-54	A	A	A	A	A	0	0	11
23458	0	1-Feb-18	31-Jan-19	Chandler	Bing	0	M	49	Male,50-54	A	A	A	A	A	0	0	2
23459	1	1-Dec-17	30-Nov-18	Phoebe	Buffay	0	F	43	Female,40-44	A	A	A	A	A	0	0	1
23460	1	1-Dec-17	30-Nov-18	Joe	Tribianni	1	M	57	Male,55-59	A	A	A	A	A	0	0	12
23461	1	1-Dec-17	30-Nov-18	Pam	Beesly	1	M	37	Male,35-39	A	A	A	A	A	0	0	12
23462	2	1-Dec-16	30-Nov-17	Jim	Helpert	0	M	40	Male,40-44	A	A	A	A	A	0	0	12
23463	1	1-Dec-17	30-Nov-18	Michael	Scott	0	F	62	Female,60-64	A	A	A	A	A	0	0	12
23464	1	1-Dec-17	30-Nov-18	Dwight	Schrute	0	F	22	Female,21-24	A	A	A	A	A	0	0	1
23465	2	1-Dec-16	30-Nov-17	Creed	Braxton	3	M	54	Male,55-59	A	A	A	A	A	0	0	12
;

data want ;
if _n_=1 then do;
if 0 then set small;
   dcl hash H (dataset:'small',multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("source", "new_id", "product") ;
   h.definedone () ;
   end;
set big;
/*assuming you want only the matches aka inner join*/
if h.find()=0;
keep Quarter First_Name Last_Name Gender  Age Source New_ID ID  Product;
run;

or

you can add use keep=dataset option to process only the wanted vars in the PDV

 

set big(keep=Quarter First_Name Last_Name Gender age);

 

View solution in original post

11 REPLIES 11
novinosrin
Tourmaline | Level 20

 

 

 

 

 

Hi @VarunD   Can you please make it easy for us, hmm rather me to attempt to help you.

 

1. You could make a small mock sample data in an excel with like 10 rows 

2. You could make a big mock sample data in an excel with like 15 rows

3. Explain what you want to accomplish by joining i.e your expected output

 

Paste your sample here. That's all you need to do here. Plz

 

Basically post your BIG, SMALL & RESULT samples with an explanation. 

 

 


@VarunD wrote:

Hello,

 

I am sorry but the database is quite big and I won't be able to provide details using INFILE option.

 

 

VarunD
Obsidian | Level 7
Will do.. Please allow me few minutes to put it together. Thanks.
VarunD
Obsidian | Level 7

Please find the spreadsheet with mock data.

Sorry about not posting it earlier as I was thinking this has something to do with the syntax since I am using hash object for the first time.

 

I simply want to match merge both the datasets on ID.

I tried using PROC SQL and left join to merge small and big but I was having performance issues.

 

So, I want all the data from small and only matching observations from BIG on ID.

 

Hope that information helps.

 

Thanks for looking into this.

novinosrin
Tourmaline | Level 20

Thanks. So do you want to fetch the records associated with matching ids in small from big, only matches? or in other words what vars do you want to pull. I can't see the expected output to make out what you want to accomplish

VarunD
Obsidian | Level 7

Sorry about that.

Yes, all observations from Small and only matches from BIG.
I want Source New_ID ID and Product from Small and
Quarter First_Name Last_Name Gender and Age from BIG.

novinosrin
Tourmaline | Level 20

data small;
input (Source	New_ID	ID	Product	Make	fou4th) (:$10.)	(Begin_Date	End_Date) (:yymmdd10.)	In1	In2;
format Begin_Date	End_Date yymmdd10.;
cards;
DB1	123	23456	Car	Honda	Y	20170101	20171115	6	6
DB1	124	23457	Car	Toyota	N	20161101	20171115	11	11
DB1	124	23458	Car	Acura	N	20170101	20171115	7	7
DB1	125	23459	Truck	Lexus	N	20170101	20171115	5	5
DB1	125	23460	Jeep	Jeep	N	20170101	20171115	6	6
DB1	126	23461	Car	Honda	N	20170401	20171115	6	6
DB1	126	23462	Car	Toyota	N	20170101	20171115	2	2
DB1	127	23463	Truck	Acura	N	20170501	20171115	12	12
DB1	127	23464	Jeep	Lexus	N	20170501	20171115	12	12
DB1	128	23465	Car	Jeep	N	20160801	20171115	1	1
;

data big;
input ID $	Quarter	(BGN_DT	END_DT) (:date9.)	(First_Name	Last_Name	MEME_SFX	Gender	Age	Category 	Cat1	Cat2	Cat3	Cat4	Cat5	Cat6	Cat7	Cat8) (:$15.);
format BGN_DT	END_DT date9.;
cards;
23456	1	1-Dec-17	30-Nov-18	Ross	Geller	4	F	19	Female,19-20	A	A	A	A	A	0	0	12
23457	1	1-Dec-17	30-Nov-18	Monica	Geller	1	F	49	Female,50-54	A	A	A	A	A	0	0	11
23458	0	1-Feb-18	31-Jan-19	Chandler	Bing	0	M	49	Male,50-54	A	A	A	A	A	0	0	2
23459	1	1-Dec-17	30-Nov-18	Phoebe	Buffay	0	F	43	Female,40-44	A	A	A	A	A	0	0	1
23460	1	1-Dec-17	30-Nov-18	Joe	Tribianni	1	M	57	Male,55-59	A	A	A	A	A	0	0	12
23461	1	1-Dec-17	30-Nov-18	Pam	Beesly	1	M	37	Male,35-39	A	A	A	A	A	0	0	12
23462	2	1-Dec-16	30-Nov-17	Jim	Helpert	0	M	40	Male,40-44	A	A	A	A	A	0	0	12
23463	1	1-Dec-17	30-Nov-18	Michael	Scott	0	F	62	Female,60-64	A	A	A	A	A	0	0	12
23464	1	1-Dec-17	30-Nov-18	Dwight	Schrute	0	F	22	Female,21-24	A	A	A	A	A	0	0	1
23465	2	1-Dec-16	30-Nov-17	Creed	Braxton	3	M	54	Male,55-59	A	A	A	A	A	0	0	12
;

data want ;
if _n_=1 then do;
if 0 then set small;
   dcl hash H (dataset:'small',multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("source", "new_id", "product") ;
   h.definedone () ;
   end;
set big;
/*assuming you want only the matches aka inner join*/
if h.find()=0;
keep Quarter First_Name Last_Name Gender  Age Source New_ID ID  Product;
run;

or

you can add use keep=dataset option to process only the wanted vars in the PDV

 

set big(keep=Quarter First_Name Last_Name Gender age);

 

VarunD
Obsidian | Level 7
Thanks a lot for your help.
Just one quick clarification about the use of multidata:'y' in the code. Is that to allow duplicate values for all variables ?
novinosrin
Tourmaline | Level 20

Yes, I was silly to add that as your id's are unique. Good catch mate! My apologies

 

I copy pasted syntax from online search and forgot to remove that piece from the syntax. Sincere apologies again!

VarunD
Obsidian | Level 7
No need to apologize.
If there was Marvel superhero with SAS powers, you would be my hero in that movie.
Astounding
PROC Star
Maybe we should rule out the obvious first. Do you really have

do unitl

instead of

do until
VarunD
Obsidian | Level 7

That was one of the issue with Code.  But, hey In my defense it was Friday evening : ) 

 

I fixed it and got other errors that I was able to resolve but it was still slow.


I found the code novinosrin provided was better, in that it is simple for someone like me to understand and modify it for re-use. 

 

Thanks for your help.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1070 views
  • 0 likes
  • 3 in conversation