<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Update large tables based on a parameter table (hash join?) in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815635#M321923</link>
    <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In SAS, I should solve a problem that based on an input table (which only has IDs), I would have to update the values ​​in a table in another table that belong to that ID. Both the table containing the IDs and the table to be updated contain a large number of records. Would a hash join be perhaps the fastest runtime? What could be done to solve this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;example tables:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;/*ID TABLE (approx. There will be 100,000 records in it)*/	
data id_table;
	length id $32 ;
	infile datalines ;
	input id $ ;
	datalines;
A
B
C
D
E
F
G
H
;
run;

/*parameter table
This table shows which field needs to be updated in which table and by what procedure (to be emptied or overwritten by a specified value)*/
data param_table;
	length table $32 coloumn_name $32 data_type $32 have_to_anonim $1 todo $100 default_value $32 desc $50;
	input table $ coloumn_name $ data_type $ have_to_anonim $ todo $ default_value $ desc $;
	datalines;
BASE_01 TESZT_COLUMN1 CHAR Y TO_EMPTY EMPTY test
BASE_01 TESZT_COLUMN2 CHAR Y TO_EMPTY EMPTY test
BASE_02 TESZT_COLUMN3 NUM Y OWERWRITE 9999 test
;
run;


/*update tables example (it will contain about 10 million records)*/	
data BASE_01;
	length id $32 TESZT_COLUMN1 $32 TESZT_COLUMN2 $32;
	input id $ TESZT_COLUMN1 $ TESZT_COLUMN2 $;
	datalines;
A wwww M
B eeee F
C rrrr M
D wwww M
E tttt M
;
run;

data BASE_02;
	length id $32 TESZT_COLUMN1 $32 TESZT_COLUMN2 8.;
	input id $ TESZT_COLUMN1 $ TESZT_COLUMN2 ;
	datalines;
A wwww 9
B eeee 9
C rrrr 9
D wwww 9
E tttt 9
;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;So in summary I want a procedure / cycle at the end that updates the value of the given field in the table based on the ID field of the input table and the parameter table.&lt;/P&gt;&lt;P&gt;Which field should be emptied or overwritten in which table and what should be overwritten if it is to be overwritten in the parameter table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It is important to save the records to be updated in a separate table before any update.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thank you in advance for your help!&lt;/P&gt;</description>
    <pubDate>Sun, 29 May 2022 22:32:20 GMT</pubDate>
    <dc:creator>ger15xxhcker</dc:creator>
    <dc:date>2022-05-29T22:32:20Z</dc:date>
    <item>
      <title>Update large tables based on a parameter table (hash join?)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815635#M321923</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In SAS, I should solve a problem that based on an input table (which only has IDs), I would have to update the values ​​in a table in another table that belong to that ID. Both the table containing the IDs and the table to be updated contain a large number of records. Would a hash join be perhaps the fastest runtime? What could be done to solve this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;example tables:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;/*ID TABLE (approx. There will be 100,000 records in it)*/	
data id_table;
	length id $32 ;
	infile datalines ;
	input id $ ;
	datalines;
A
B
C
D
E
F
G
H
;
run;

/*parameter table
This table shows which field needs to be updated in which table and by what procedure (to be emptied or overwritten by a specified value)*/
data param_table;
	length table $32 coloumn_name $32 data_type $32 have_to_anonim $1 todo $100 default_value $32 desc $50;
	input table $ coloumn_name $ data_type $ have_to_anonim $ todo $ default_value $ desc $;
	datalines;
BASE_01 TESZT_COLUMN1 CHAR Y TO_EMPTY EMPTY test
BASE_01 TESZT_COLUMN2 CHAR Y TO_EMPTY EMPTY test
BASE_02 TESZT_COLUMN3 NUM Y OWERWRITE 9999 test
;
run;


/*update tables example (it will contain about 10 million records)*/	
data BASE_01;
	length id $32 TESZT_COLUMN1 $32 TESZT_COLUMN2 $32;
	input id $ TESZT_COLUMN1 $ TESZT_COLUMN2 $;
	datalines;
A wwww M
B eeee F
C rrrr M
D wwww M
E tttt M
;
run;

data BASE_02;
	length id $32 TESZT_COLUMN1 $32 TESZT_COLUMN2 8.;
	input id $ TESZT_COLUMN1 $ TESZT_COLUMN2 ;
	datalines;
A wwww 9
B eeee 9
C rrrr 9
D wwww 9
E tttt 9
;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;So in summary I want a procedure / cycle at the end that updates the value of the given field in the table based on the ID field of the input table and the parameter table.&lt;/P&gt;&lt;P&gt;Which field should be emptied or overwritten in which table and what should be overwritten if it is to be overwritten in the parameter table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It is important to save the records to be updated in a separate table before any update.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thank you in advance for your help!&lt;/P&gt;</description>
      <pubDate>Sun, 29 May 2022 22:32:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815635#M321923</guid>
      <dc:creator>ger15xxhcker</dc:creator>
      <dc:date>2022-05-29T22:32:20Z</dc:date>
    </item>
    <item>
      <title>Re: Update large tables based on a parameter table (hash join?)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815679#M321943</link>
      <description>&lt;P&gt;10'' records doesn't sound much. Do you have any contraints/requirments when it comes to execution time/resource consumption?&lt;/P&gt;
&lt;P&gt;I would probaby start with a simple SQL and see if it's quick enough (and SQL can also perform implict hash-joins if the conditions are right).&lt;/P&gt;</description>
      <pubDate>Mon, 30 May 2022 11:58:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815679#M321943</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2022-05-30T11:58:12Z</dc:date>
    </item>
    <item>
      <title>Re: Update large tables based on a parameter table (hash join?)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815710#M321957</link>
      <description>&lt;P&gt;Not sure I follow what you are asking but it looks to me like that middle dataset is the key one.&lt;/P&gt;
&lt;P&gt;It appears to be instructions for how to generate code.&lt;/P&gt;
&lt;P&gt;So why not just use it to generate code?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;filename code temp;
data _null_;
  set param_table ;
  where have_to_anonim = 'Y';
  by table;
  file code;
  if first.table then
    put 'data new.' table ';'
      / '  set old.' table ';'
    ;
  end;
  * now write out the code for converting each variable ;
  ...
  if last.table then put 'run;' ;
run;
%include code / source2 ;
   &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 30 May 2022 15:17:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815710#M321957</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-05-30T15:17:32Z</dc:date>
    </item>
    <item>
      <title>Re: Update large tables based on a parameter table (hash join?)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815715#M321958</link>
      <description>&lt;P&gt;First, thank you for providing sample data in the form of working DATA steps.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;... I would like to make a suggestion using a tested program.&amp;nbsp; But please make your sample data more consistent (e.g. there is no TESZT_COLUMN3 in BASE_02, but the PARAM_TABLE expects one). And a bit richer - have both an EMPTY and&amp;nbsp; an OVERWRITE for both numeric and character variables in the param_table.&amp;nbsp; And have a couple of untouched variables in BASE_01 or BASE_02.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I do think a hash object (2 actually) is a good way to go, but perhaps not as you expect.&lt;/P&gt;</description>
      <pubDate>Mon, 30 May 2022 15:52:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815715#M321958</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2022-05-30T15:52:07Z</dc:date>
    </item>
    <item>
      <title>Re: Update large tables based on a parameter table (hash join?)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815750#M321973</link>
      <description>&lt;P&gt;Here's a program that is not completely tested, but likely works:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let dsn=BASE_01;

%let dummyvars=;
%let numdflt=;
%let chrdflt=;
proc sql noprint;
  select column_name into :dummyvars     separated by ' ' from param_table where table="&amp;amp;DSN";
  select cats(column_name,'=',default_value)              into :numdflt  separated by ';' from param_table where table="&amp;amp;DSN" and todo ='OVERWRITE' and data_type='NUM';
  select cats(column_name,'=',quote(trim(default_value))) into :chrdflt  separated by ';' from param_table where table="&amp;amp;DSN" and todo ='OVERWRITE' and data_type='CHAR';
quit;
%put &amp;amp;=dummyvars;
%put %quote(&amp;amp;numdflt);
%put %quote(&amp;amp;chrdflt);

data dummy ;
  if 0 then set &amp;amp;dsn (keep=id &amp;amp;dummyvars rename=(id=_dummy_id));
  &amp;amp;numdflt ;
  &amp;amp;chrdflt ;
  output;
  stop;
run;

data &amp;amp;dsn;
  modify &amp;amp;dsn;
  if _n_=1 then do;
    if 0 then set dummy;
    declare hash dummy (dataset:"dummy");
      dummy.definekey('_dummy_id') ;
      dummy.definedata(all:'Y');
      dummy.definedone();
    declare hash ids(dataset:'id_table');
      ids.definekey('id');
      ids.definedone();
  end;
  if ids.find()=0 then do;
    dummy.find();
    replace;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The only line you have to change is the first, which identifies the dataset to be updated (macrovar DSN).&amp;nbsp; The program creates a single observation dataset DUMMY which has the set of variables from table DSN that appear in PARAM_TABLE.&amp;nbsp; Those variables will have either a missing value (TODO="TO_EMPTY") or the default values (TODO="OVERWRITE").&amp;nbsp; That single obs dataset becomes a single dataitem hash object H, which is retrieved only when an ID is found in the second hash object IDS (based on dataset ID_TABLE).&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note this program updates dataset DSN in place, but you don't have to do this.&amp;nbsp; You could modify the last data step above to something like&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data &amp;amp;dsn._new;
  set &amp;amp;dsn;
  if _n_=1 then do;
    ... same code ...;
  end;
  if ids.find()=0 then dummy.find();
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2022 01:24:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Update-large-tables-based-on-a-parameter-table-hash-join/m-p/815750#M321973</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2022-05-31T01:24:17Z</dc:date>
    </item>
  </channel>
</rss>

