Hi!
I have a table called ID_TABLE that contains only IDs. ID_ONE is the name of the column.There is another table called FULL_TABLE which has an ID_TWO column and two other columns called A1 and B1.
If the value of the ID_ONE field is equal to the ID_TWO field, the value of fields A1 and B1 in the FULL_TABLE table must be overwritten with an 'X'.
Table A has about 30,000 rows. Table B has 40,000,000 rows.
How can this be solved with the best performance? Hash joinnal? Proc sql with very slow. The board itself should be overwritten.
Thanks
Hash object:
data want;
set b;
if _n_ = 1
then do;
declare hash a (dataset:"a (rename=(id_one=id_two))");
a.definekey("id_two");
a.definedone();
end;
if a.check() = 0
then do;
a1 = "X";
b1 = "X";
end;
run;
You can try this:
/*Sort source tables first so you can do a Data Step Merge*/
proc sort data = ID_TABLE out=work.ID_TABLE;
by ID_ONE;
run;
proc sort data = FULL_TABLE out=work.FULL_TABLE;
by ID_TWO;
run;
/*Use the Data Step to merge and create desired output in a new table*/
data newtable;
merge work.ID_TABLE (IN=IT) work.FULL_TABLE(IN=FT RENAME=(ID_TWO=ID_ONE));
by ID_ONE;
if IT = 1 and FT=1 then
do;
A1 = 'X';
B1 = 'X';
end;
run;
Hash object:
data want;
set b;
if _n_ = 1
then do;
declare hash a (dataset:"a (rename=(id_one=id_two))");
a.definekey("id_two");
a.definedone();
end;
if a.check() = 0
then do;
a1 = "X";
b1 = "X";
end;
run;
Thanks! That's exactly what I was thinking! I learned a lot from it. Do you know how I could extract from this hash update how many successful updates there were in the target table?
Keep a running count, and store it in a macro variable for later use:
data want;
set b end=done;
if _n_ = 1
then do;
declare hash a (dataset:"a (rename=(id_one=id_two))");
a.definekey("id_two");
a.definedone();
end;
if a.check() = 0
then do;
a1 = "X";
b1 = "X";
count + 1;
end;
if done
then do;
put count=;
call symputx('count',count);
end;
drop count;
run;
Thanks!
If I want to update table' b' locally instead of 'want' table and there is an index on the 'b' table, will it be dropped or retained? So after the data there will be the ' b' table and after the set there will be also the 'b' table.
After recreating a dataset, you must also recreate the index.
Run this simple example and look at the print output:
data class;
set sashelp.class;
run;
proc sql;
create index sex on work.class(sex);
quit;
proc datasets lib=work;
quit;
data class;
set class;
x = 1;
run;
proc datasets lib=work;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.