Solved: Call execute for proc sql update

torvyle · Posted 04-26-2018 01:17 PM

Hello,

I have a very large dataset that needs to be updated using another very large reference dataset. I've tried Proc Sql Update but it is taking a very long time. I've come to know that running Proc Sql Update with Call Execute is a better alternative. I would appreciate it if someone can provide the Call Execute codes for the following Proc Sql:

proc sql;

update datawant a

set var1=(select var1 from datahave where a.var=var),

var2=(select var2 from datahave where a.var=var);

quit;

ballardw · Posted 04-26-2018 05:40 PM

Your log shows this from a data _null_ step creating the call execute:

27         call execute("update work.datawant a set proc_start_date_time1='"||proc_start_date_time1||"' where
27       ! TRANSACTION_ID="||TRANSACTION_ID||";");
28         if final then call execute('quit;');
29         run;

Pay very close attention to the bit

set proc_start_date_time1='"

As that is what generates the error shown in the log

         + update work.datawant a set proc_start_date_time1='  1776505260' where TRA

So the call execute result is telling the update to use the character value within quotes ' 1776505260' . Which is not numeric.

Remove the single quotes from the call execute.

I suspect borrowing some code from an example updating a character variable.

If you are updating a data set where there is a unique key value to match your data on then a datastep update might be faster and would allow updating multiple variables in a single step. For this to work you would need to have the data sorted by that key variable(s), use a by statement with those variables and the base data cannot have any repeats of the key variable combinations.

A very brief program would look like this assuming the data is sorted prior:

data bigdata;
  update bigdata
         transactiondata
  ;
  by keyvariable;
run;

By default the values of variables from the transactiondata set would replace the values of the same named variables on the matching records in bigdata. An option on the UPDATE statement, UPDATEMODE sets whether missing values in the transaction data would replace values in the bigdata set. UPDATEMODE=MISSINGCHECK , the default, prevents missing from replacing the values, NOMISSINGCHECK would allow replace values with missing from the transaction data.

Note that with this approach any other variables in the transaction data will be added to the data so you may need to drop some variables so only the key variable(s) and the desired variables to update are present.

View solution in original post

PaigeMiller · Posted 04-26-2018 01:53 PM

I've come to know that running Proc Sql Update with Call Execute is a better alternative.

I'm not aware of this. It seems to me that the execution time of a PROC SQL call doesn't depend on how the PROC SQL call was initiated, via plain old PROC SQL, or by CALL EXECUTE, or via a macro. Please explain.

I would appreciate it if someone can provide the Call Execute codes for the following Proc Sql:

proc sql;

update datawant a

set var1=(select var1 from datahave where a.var=var),

var2=(select var2 from datahave where a.var=var);

quit;

I see absolutely no reason and no benefit to doing this from CALL EXECUTE, which allows you to dynamically change the code based upon the contents of a data set. I see no need in your code or in your explanation where this dynamic capability is needed.

--
Paige Miller

torvyle · Posted 04-26-2018 02:10 PM

The dataset that needs updating and the reference dataset have about 4 million rows. When I ran the Proc Sql I let SAS run for about 30min before I stopped it. Although the question have the Proc Sql code for updating of 2 variables, the dataset, in fact, has 35 variables that need updating (mostly date variables).

I first came across the following webpage but I'm having trouble adapting its Call Execute codes to my needs: https://heuristically.wordpress.com/2010/08/06/make-proc-sql-update-faster-sas-call-execute-data-ste...

Thank you

PaigeMiller · Posted 04-26-2018 02:51 PM

Okay, I did not know that, and still not sure I understand it. However...

I'm having trouble adapting its Call Execute codes to my needs:

As always here in the SAS Community, show us your SASLOG and other indication of problems. We can't help you if you just say you are having trouble, you have to be specific and show us exactly what you did and what isn't working.

--
Paige Miller

torvyle · Posted 04-26-2018 04:21 PM

I would like to see if the Call Execute works with one variable (date variable) before I add another. Here's the code:

proc sql;
update work.datawant a
set proc_start_date_time1=(select proc_start_date_time1 from work.datahave where a.transaction_id=transaction_id);
quit;

data _null_;
set work.datahave end=final;
if _n_ = 1 then call execute("proc sql ;");
call execute("update work.datawant a set proc_start_date_time1='"||proc_start_date_time1||"' where transaction_id="||transaction_id||";");
if final then call execute('quit;');
run;

I've attached the log.

PaigeMiller · Posted 04-26-2018 04:32 PM

ERROR: proc_start_date_time1, a numeric column, can only be updated with a numeric expression.

Seems pretty clear to me, the variable proc_start_date_time1 is numeric, and you are trying to insert a character value. You can't do that. You can only insert numeric varlues.

--
Paige Miller

torvyle · Posted 04-26-2018 04:59 PM

Both are date variables:

Here's the summary table for both datasets:

ballardw · Posted 04-26-2018 05:40 PM

Your log shows this from a data _null_ step creating the call execute:

27         call execute("update work.datawant a set proc_start_date_time1='"||proc_start_date_time1||"' where
27       ! TRANSACTION_ID="||TRANSACTION_ID||";");
28         if final then call execute('quit;');
29         run;

Pay very close attention to the bit

set proc_start_date_time1='"

As that is what generates the error shown in the log

         + update work.datawant a set proc_start_date_time1='  1776505260' where TRA

So the call execute result is telling the update to use the character value within quotes ' 1776505260' . Which is not numeric.

Remove the single quotes from the call execute.

I suspect borrowing some code from an example updating a character variable.

If you are updating a data set where there is a unique key value to match your data on then a datastep update might be faster and would allow updating multiple variables in a single step. For this to work you would need to have the data sorted by that key variable(s), use a by statement with those variables and the base data cannot have any repeats of the key variable combinations.

A very brief program would look like this assuming the data is sorted prior:

data bigdata;
  update bigdata
         transactiondata
  ;
  by keyvariable;
run;

By default the values of variables from the transactiondata set would replace the values of the same named variables on the matching records in bigdata. An option on the UPDATE statement, UPDATEMODE sets whether missing values in the transaction data would replace values in the bigdata set. UPDATEMODE=MISSINGCHECK , the default, prevents missing from replacing the values, NOMISSINGCHECK would allow replace values with missing from the transaction data.

Note that with this approach any other variables in the transaction data will be added to the data so you may need to drop some variables so only the key variable(s) and the desired variables to update are present.

torvyle · Posted 04-27-2018 09:55 AM

Thank you very much, ballardw! Removal of the single quotes worked.

If I were to update two variables, the code is this:

data _null_;
set work.datahave end=final;
if _n_ = 1 then call execute("proc sql ;");
call execute("update work.datawant a set proc_start_date_time1="||proc_start_date_time1||" where transaction_id="||transaction_id||";");
call execute("update work.datawant a set proc_start_date_time2="||proc_start_date_time2||" where transaction_id="||transaction_id||";");
if final then call execute('quit;');
run;

When I tried to update the large dataset using Call Execute the SAS server stopped around 4min mark. I'll try to get this sorted and try your data step suggestion as well.

Call execute for proc sql update

Re: Call execute for proc sql update

Re: Call execute for proc sql update

Re: Call execute for proc sql update

Re: Call execute for proc sql update

Re: Call execute for proc sql update

Re: Call execute for proc sql update

Re: Call execute for proc sql update

Re: Call execute for proc sql update

Re: Call execute for proc sql update

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away