I've inherited some code that's using a DATA step UPDATE statement, but my DATA step skills are very rusty and I want to make sure I'm understanding it correctly (I'm more of a PROC SQL guy). The code has poor performance (largely because it's trying to move a bunch of data back and forth between SAS and Oracle) so I'm trying to convert the code to explicit pass-through SQL so I can keep all of the processing local to Oracle. Here is a simplified version of the part of the code in question: PROC SQL;
CONNECT USING LIB_DB1 AS OracDB;
CREATE TABLE LIB_OUT.Ref_Table_1 AS
SELECT *
FROM CONNECTION TO OracDB
(SELECT A.txn_id
,A.Column_1
,B.Column_2
FROM SCHEMA_A.TABLE_A A
INNER JOIN SCHEMA_A.TABLE_B B
ON A.KEY_A = B.KEY_B
ORDER BY A.txn_id
);
DISCONNECT FROM OracDB;
QUIT;
PROC SQL;
CONNECT USING LIB_DB2 AS OracDB;
CREATE TABLE LIB_OUT.Ref_Table_2 AS
SELECT *
FROM CONNECTION TO OracDB
(SELECT txn_id
,Column_1
FROM SCHEMA_B.TABLE_C
WHERE COLUMN_A = &var_A
);
DISCONNECT FROM OracDB;
QUIT;
PROC SORT DATA=LIB_OUT.Ref_Table_1 NODUPKEY;
BY txn_id;
RUN;
PROC SORT DATA=LIB_OUT.Ref_Table_2;
BY txn_id;
RUN;
DATA LIB_OUT.Ref_Table_3;
UPDATE LIB_OUT.Ref_Table_2 LIB_OUT.Ref_Table_1;
BY txn_id;
RUN; My question is on the UPDATE statement on the final DATA step, if I understand it correctly after this step is run then Ref_Table_3 will have all of the data from Ref_Table_2, but will have an added column called Column_2 with missing values where there is no matching txn_id value in Ref_Table_1 and Ref_Table_2, and the value from Ref_Table_1 where there is a matching txn_id. Similarly, for Column_1, if there is no matching txn_id between Ref_Table_1 and Ref_Table_2, then whatever was originally was in Ref_Table_2 will remain, but where there is a matching txn_id Column_1 will be updated to whatever value is in Ref_Table_1 (or left the same if they match). Is my understanding correct or have I missed something?
... View more