BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ssafmed
Obsidian | Level 7

Hello,

 

I want to merge 2 tables  using Hash Object by keyA and keyB that have the same values(but not the same name in the tables), but in my output I want just a few values from both tables, 

 

Example

table A : keyA, Cl_A_1,  Cl_A_2, Cl_A_3, Cl_A_4, CL_A5

table B : keyB, Cl_B_1,Cl_B_2,Cl_B_3,Cl_B_4,Cl_B_5,Cl_B_6

------------------------------------------------------------------

Output : KeyA, keyB, Cl_A_1,  Cl_A_2, Cl_B_1,Cl_B_2,Cl_B_3

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below how I would code for this. Make sure that you load the table with the lower volume into the hash. That's from a memory consumption and performance perspective much more important than if you've got a few more variables than you would need in the PDV. As long as you only write variables to the output table that you need (drop/keep) the difference to performance is negligible. 

data Set1;
  if _n_ = 1 then
    do;
      if 0 then set class(keep=name sex age);
      declare Hash Match1 (dataset:'class');
      Match1.DefineKey ('Name');
      Match1.DefineData ('Name','Sex','Age');
      Match1.DefineDone ();
    end;
  call missing(of _all_);

  set cities(keep=Persona department);

  if Match1.find(key:Persona) = 0;

  /* potentially more statements */
  drop persona;
run;

Also be aware that at least in your sample data column Persona has a length of 10 but Name only a length of 8. Should the string length stored in Persona exceed 8 characters then the hash lookup will return an error like: ERROR: Argument length greater than length of key variable Name

View solution in original post

14 REPLIES 14
mkeintz
PROC Star

Plesse provide sample data in the form of a working DATA step, and a working DATA step with the desired result.

 

With that clarification, we'll be able to determine a way to get from sample input to desired output, with tested code.

 

Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Easiest is to just rename the key variable to what's in your base table when loading into the hash table.

dcl hash h1 (dataset:'your_table(rename=(keyB=KeyA))');

 

"but in my output I want just a few values from both tables

You need to further specify what this is about AND you need to provide sample data and desired result. Please provide the sample data via a working SAS datastep that creates such data.

ssafmed
Obsidian | Level 7

Here is an example :

 

data class;
input Name$ Sex$ Age Height Weight;
infile datalines dlm=' ' dsd;
datalines;
Alfred M 14 69.0 112.5
Alice F 13 56.5 84.0
Barbara F 13 65.3 98.0
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
;
 
 
data cities;
input zipcode City :$30. Department :$30. Persona:$10.;
infile datalines dlm=',' dsd;
datalines;
75008, Paris 8è ,Paris,Alice
91940, Les Ulis, Essonne,Carol
92330,Sceaux, Hauts-de-Seine,Henry
93140, Bondy , Seine-Saint-Denis,Henry
94150, Rungis, Val-de-Marne,Alice
;
 
data Set1 ;
if 0 then set class cities; 
 
 if _n_ = 1 then do;
declare Hash Match1 (dataset:'class'); 
 
 Match1.DefineKey ('Name'); 
 Match1.DefineData ('Name','Sex','Age'); 
 Match1.DefineDone (); 
 end;
 
 set cities;
if Match1.find(key:Persona) = 0 then output; 
 
run;
What I want is an output set that contains Name Sex Age Department! I can do it with (keep=Name Sex Age Department), but I dont want to process this variables!
Thanks
Kurt_Bremser
Super User

Add these two statements to your IF N_ = 1 block:

length name $8 sex $8 age 8;
call missing(name,sex,age);

You have to tell the data step compiler about the new variables, so it can compile them into the PDV. The second statement is just there to avoid the "uninitialized" NOTE.

ssafmed
Obsidian | Level 7

Hello,

 

In fact, in my output I just want de keep Name Sex Age and department and not (zipcode, City, Persona) 

I dont even want that the variables (zipcode, City, Persona)  went to PVD

 

Thanks

ssafmed
Obsidian | Level 7

In this case it will be process in PDV, but I want to avoid that for efficiency because my data are very big!

 

Thanks

Kurt_Bremser
Super User

When you exclude variables in the SET statement, which controls the reading of a dataset, those variables will not be part of the PDV. You can verify this with a PUT _ALL_; statement.

 

BTW since the whole dataset file will be p?hysically read anyway, the number of variables in the PDV won't have an influence on processing time, only on memory used, and with so few as you describe, the effect of dropping them will be nil.

ssafmed
Obsidian | Level 7

I put a keep just after If 0 (I dont know if it is that you mean by Set reading statment)

 

but still processing all the variables in PVD (please see log below)

Code :

if 0 then set class cities; 
keep Name Sex Age Persona Department;
 if _n_ = 1 then do;
declare Hash Match1 (dataset:'class'); 
 
 Match1.DefineKey ('Name'); 
 Match1.DefineData ('Name','Sex','Age'); 
 Match1.DefineDone (); 
 end;
 
 set cities ;
 
if Match1.find(key:Persona) = 0 then output; 
PUT _ALL_;
run;

 

log :

NOTE: There were 5 observations read from the data set WORK.CLASS.
Name=Alice Sex=F Age=13 Height=. Weight=. zipcode=75008 City=Paris 8è Department=Paris Persona=Alice _ERROR_=0 _N_=1
Name=Carol Sex=F Age=14 Height=. Weight=. zipcode=91940 City=Les Ulis Department=Essonne Persona=Carol _ERROR_=0 _N_=2
Name=Henry Sex=M Age=14 Height=. Weight=. zipcode=92330 City=Sceaux Department=Hauts-de-Seine Persona=Henry _ERROR_=0 _N_=3
Name=Henry Sex=M Age=14 Height=. Weight=. zipcode=93140 City=Bondy Department=Seine-Saint-Denis Persona=Henry _ERROR_=0 _N_=4
Name=Alice Sex=F Age=13 Height=. Weight=. zipcode=94150 City=Rungis Department=Val-de-Marne Persona=Alice _ERROR_=0 _N_=5
data Set1;
 
Thanks
ssafmed
Obsidian | Level 7

Hello,

It's what I did but still not solve the problem!

 

Thanks!

 

if 0 then set class cities; 
keep Name Sex Age Persona Department;
 if _n_ = 1 then do;
declare Hash Match1 (dataset:'class'); 
Kurt_Bremser
Super User

@ssafmed wrote:

Hello,

It's what I did but still not solve the problem!

 

Thanks!

 

if 0 then set class cities; 
keep Name Sex Age Persona Department;
 if _n_ = 1 then do;
declare Hash Match1 (dataset:'class'); 

You are still using the statement instead of the dataset option. Please read the documentation I linked for you.

Tom
Super User Tom
Super User

Let's see what your code does:

if 0 then set class cities; 

Since SAS will evaluate zero as FALSE the SET statement will never run.  But the data step compiler will see it and add all of the variables from both the CLASS and CITIES datasets.

keep Name Sex Age Persona Department;

This says to only output these 5 specific variables.  If CLASS is like SASHELP.CLASS then the first three will be defined as in CLASS and the second two will be defined as in CITIES.  If any of those five variables do not exist in either CLASS or CITIES and are not later defined explicitly or implicitly by other code in the step then SAS not create them and instead write a WARNING message to the SAS log.

 

Perhaps you meant to do something like:

if 0 then set class(keep=Name Sex Age) cities(keep=Persona Department); 

Now the data step compile step will only add those specific 5 variables.  But you will have to also check other places where you use either CLASS or CITIES in another SET or MERGE statement to make sure that does not bring in other variables.

Patrick
Opal | Level 21

Below how I would code for this. Make sure that you load the table with the lower volume into the hash. That's from a memory consumption and performance perspective much more important than if you've got a few more variables than you would need in the PDV. As long as you only write variables to the output table that you need (drop/keep) the difference to performance is negligible. 

data Set1;
  if _n_ = 1 then
    do;
      if 0 then set class(keep=name sex age);
      declare Hash Match1 (dataset:'class');
      Match1.DefineKey ('Name');
      Match1.DefineData ('Name','Sex','Age');
      Match1.DefineDone ();
    end;
  call missing(of _all_);

  set cities(keep=Persona department);

  if Match1.find(key:Persona) = 0;

  /* potentially more statements */
  drop persona;
run;

Also be aware that at least in your sample data column Persona has a length of 10 but Name only a length of 8. Should the string length stored in Persona exceed 8 characters then the hash lookup will return an error like: ERROR: Argument length greater than length of key variable Name

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2592 views
  • 6 likes
  • 5 in conversation