BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ger15xxhcker
Quartz | Level 8

Hi!

 

I have a table called ID_TABLE that contains only IDs. ID_ONE is the name of the column.There is another table called FULL_TABLE which has an ID_TWO column and two other columns called A1 and B1.

 

If the value of the ID_ONE field is equal to the ID_TWO field, the value of fields A1 and B1 in the FULL_TABLE table must be overwritten with an 'X'.

Table A has about 30,000 rows. Table B has 40,000,000 rows.

How can this be solved with the best performance? Hash joinnal? Proc sql with very slow. The board itself should be overwritten.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Hash object:

data want;
set b;
if _n_ = 1
then do;
  declare hash a (dataset:"a (rename=(id_one=id_two))");
  a.definekey("id_two");
  a.definedone();
end;
if a.check() = 0
then do;
  a1 = "X";
  b1 = "X";
end;
run;

View solution in original post

6 REPLIES 6
JOL
SAS Employee JOL
SAS Employee

You can try this:

 

/*Sort source tables first so you can do a Data Step Merge*/
proc sort data = ID_TABLE out=work.ID_TABLE;
by ID_ONE;
run;

proc sort data = FULL_TABLE out=work.FULL_TABLE;
by ID_TWO;
run;

 

/*Use the Data Step to merge and create desired output in a new table*/
data newtable;
merge work.ID_TABLE (IN=IT) work.FULL_TABLE(IN=FT RENAME=(ID_TWO=ID_ONE));
by ID_ONE;

if IT = 1 and FT=1 then
do;
A1 = 'X';
B1 = 'X';
end;
run;

Kurt_Bremser
Super User

Hash object:

data want;
set b;
if _n_ = 1
then do;
  declare hash a (dataset:"a (rename=(id_one=id_two))");
  a.definekey("id_two");
  a.definedone();
end;
if a.check() = 0
then do;
  a1 = "X";
  b1 = "X";
end;
run;
ger15xxhcker
Quartz | Level 8

Thanks! That's exactly what I was thinking! I learned a lot from it. Do you know how I could extract from this hash update how many successful updates there were in the target table?

Kurt_Bremser
Super User

Keep a running count, and store it in a macro variable for later use:

data want;
set b end=done;
if _n_ = 1
then do;
  declare hash a (dataset:"a (rename=(id_one=id_two))");
  a.definekey("id_two");
  a.definedone();
end;
if a.check() = 0
then do;
  a1 = "X";
  b1 = "X";
  count + 1;
end;
if done
then do;
  put count=;
  call symputx('count',count);
end;
drop count;
run;
ger15xxhcker
Quartz | Level 8

Thanks! 

If I want to update table' b' locally instead of 'want' table and there is an index on the 'b' table, will it be dropped or retained? So after the data there will be the ' b' table and after the set there will be also the 'b' table.

 

Kurt_Bremser
Super User

After recreating a dataset, you must also recreate the index.

Run this simple example and look at the print output:

data class;
set sashelp.class;
run;

proc sql;
create index sex on work.class(sex);
quit;

proc datasets lib=work;
quit;

data class;
set class;
x = 1;
run;

proc datasets lib=work;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1825 views
  • 2 likes
  • 3 in conversation