Solved: Re: update large table based another table key

ger15xxhcker · Posted 06-16-2022 02:11 PM

Hi!

I have a table called ID_TABLE that contains only IDs. ID_ONE is the name of the column.There is another table called FULL_TABLE which has an ID_TWO column and two other columns called A1 and B1.

If the value of the ID_ONE field is equal to the ID_TWO field, the value of fields A1 and B1 in the FULL_TABLE table must be overwritten with an 'X'.

Table A has about 30,000 rows. Table B has 40,000,000 rows.

How can this be solved with the best performance? Hash joinnal? Proc sql with very slow. The board itself should be overwritten.

Thanks

Kurt_Bremser · Posted 06-16-2022 05:01 PM

Hash object:

data want;
set b;
if _n_ = 1
then do;
  declare hash a (dataset:"a (rename=(id_one=id_two))");
  a.definekey("id_two");
  a.definedone();
end;
if a.check() = 0
then do;
  a1 = "X";
  b1 = "X";
end;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

JOL · Posted 06-16-2022 02:31 PM

You can try this:

/*Sort source tables first so you can do a Data Step Merge*/
proc sort data = ID_TABLE out=work.ID_TABLE;
by ID_ONE;
run;

proc sort data = FULL_TABLE out=work.FULL_TABLE;
by ID_TWO;
run;

/*Use the Data Step to merge and create desired output in a new table*/
data newtable;
merge work.ID_TABLE (IN=IT) work.FULL_TABLE(IN=FT RENAME=(ID_TWO=ID_ONE));
by ID_ONE;

if IT = 1 and FT=1 then
do;
A1 = 'X';
B1 = 'X';
end;
run;

Kurt_Bremser · Posted 06-16-2022 05:01 PM

Hash object:

data want;
set b;
if _n_ = 1
then do;
  declare hash a (dataset:"a (rename=(id_one=id_two))");
  a.definekey("id_two");
  a.definedone();
end;
if a.check() = 0
then do;
  a1 = "X";
  b1 = "X";
end;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ger15xxhcker · Posted 06-17-2022 02:48 PM

Thanks! That's exactly what I was thinking! I learned a lot from it. Do you know how I could extract from this hash update how many successful updates there were in the target table?

Kurt_Bremser · Posted 06-17-2022 04:21 PM

Keep a running count, and store it in a macro variable for later use:

data want;
set b end=done;
if _n_ = 1
then do;
  declare hash a (dataset:"a (rename=(id_one=id_two))");
  a.definekey("id_two");
  a.definedone();
end;
if a.check() = 0
then do;
  a1 = "X";
  b1 = "X";
  count + 1;
end;
if done
then do;
  put count=;
  call symputx('count',count);
end;
drop count;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ger15xxhcker · Posted 06-20-2022 07:31 PM

Thanks!

If I want to update table' b' locally instead of 'want' table and there is an index on the 'b' table, will it be dropped or retained? So after the data there will be the ' b' table and after the set there will be also the 'b' table.

Kurt_Bremser · Posted 06-23-2022 03:21 AM

After recreating a dataset, you must also recreate the index.

Run this simple example and look at the print output:

data class;
set sashelp.class;
run;

proc sql;
create index sex on work.class(sex);
quit;

proc datasets lib=work;
quit;

data class;
set class;
x = 1;
run;

proc datasets lib=work;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away