BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
torvyle
Calcite | Level 5

Hello,

 

I have a very large dataset that needs to be updated using another very large reference dataset. I've tried Proc Sql Update but it is taking a very long time. I've come to know that running Proc Sql Update with Call Execute is a better alternative. I would appreciate it if someone can provide the Call Execute codes for the following Proc Sql:

 

proc sql;

update datawant a

set var1=(select var1 from datahave where a.var=var),

var2=(select var2 from datahave where a.var=var);

quit;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Your log shows this from a data _null_ step creating the call execute:

27         call execute("update work.datawant a set proc_start_date_time1='"||proc_start_date_time1||"' where
27       ! TRANSACTION_ID="||TRANSACTION_ID||";");
28         if final then call execute('quit;');
29         run;

Pay very close attention to the bit

 

set proc_start_date_time1='"

As that is what generates the error shown in the log

 

         + update work.datawant a set proc_start_date_time1='  1776505260' where TRA

So the call execute result is telling the update to use the character value within quotes '  1776505260' . Which is not numeric.

 

Remove the single quotes from the call execute.

I suspect borrowing some code from an example updating a character variable.

 

If you are updating a data set where there is a unique key value to match your data on then a datastep update might be faster and would allow updating multiple variables in a single step. For this to work you would need to have the data sorted by that key variable(s), use a by statement with those variables and the base data cannot have any repeats of the key variable combinations.

 

A very brief program would look like this assuming the data is sorted prior:

data bigdata;
  update bigdata
         transactiondata
  ;
  by keyvariable;
run;

By default the values of variables from the transactiondata set would replace the values of the same named variables on the matching records in bigdata. An option on the UPDATE statement, UPDATEMODE sets whether missing values in the transaction data would replace values in the bigdata set. UPDATEMODE=MISSINGCHECK , the default, prevents missing from replacing the values, NOMISSINGCHECK would allow replace values with missing from the transaction data.

 

 

Note that with this approach any other variables in the transaction data will be added to the data so you may need to drop some variables so only the key variable(s) and the desired variables to update are present.

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

I've come to know that running Proc Sql Update with Call Execute is a better alternative.

 

I'm not aware of this. It seems to me that the execution time of a PROC SQL call doesn't depend on how the PROC SQL call was initiated, via plain old PROC SQL, or by CALL EXECUTE, or via a macro. Please explain.

 

I would appreciate it if someone can provide the Call Execute codes for the following Proc Sql:

 

proc sql;

update datawant a

set var1=(select var1 from datahave where a.var=var),

var2=(select var2 from datahave where a.var=var);

quit;

 

I see absolutely no reason and no benefit to doing this from CALL EXECUTE, which allows you to dynamically change the code based upon the contents of a data set. I see no need in your code or in your explanation where this dynamic capability is needed.

--
Paige Miller
torvyle
Calcite | Level 5

The dataset that needs updating and the reference dataset have about 4 million rows. When I ran the Proc Sql I let SAS run for about 30min before I stopped it. Although the question have the Proc Sql code for updating of 2 variables, the dataset, in fact, has 35 variables that need updating (mostly date variables). 

 

I first came across the following webpage but I'm having trouble adapting its Call Execute codes to my needs: https://heuristically.wordpress.com/2010/08/06/make-proc-sql-update-faster-sas-call-execute-data-ste...

 

Thank you

PaigeMiller
Diamond | Level 26

Okay, I did not know that, and still not sure I understand it. However...

 

 I'm having trouble adapting its Call Execute codes to my needs: 

 

As always here in the SAS Community, show us your SASLOG and other indication of problems. We can't help you if you just say you are having trouble, you have to be specific and show us exactly what you did and what isn't working.

--
Paige Miller
torvyle
Calcite | Level 5

I would like to see if the Call Execute works with one variable (date variable) before I add another. Here's the code:

 

proc sql;
update work.datawant a
set proc_start_date_time1=(select proc_start_date_time1 from work.datahave where a.transaction_id=transaction_id);
quit;

data _null_;
set work.datahave end=final;
if _n_ = 1 then call execute("proc sql ;");
call execute("update work.datawant a set proc_start_date_time1='"||proc_start_date_time1||"' where transaction_id="||transaction_id||";");
if final then call execute('quit;');
run;

 

I've attached the log. 

 

PaigeMiller
Diamond | Level 26
ERROR: proc_start_date_time1, a numeric column, can only be updated with a numeric expression.

Seems pretty clear to me, the variable proc_start_date_time1 is numeric, and you are trying to insert a character value. You can't do that. You can only insert numeric varlues.

--
Paige Miller
torvyle
Calcite | Level 5

Both are date variables:

 

Here's the summary table for both datasets:

 

 Untitled.png

ballardw
Super User

Your log shows this from a data _null_ step creating the call execute:

27         call execute("update work.datawant a set proc_start_date_time1='"||proc_start_date_time1||"' where
27       ! TRANSACTION_ID="||TRANSACTION_ID||";");
28         if final then call execute('quit;');
29         run;

Pay very close attention to the bit

 

set proc_start_date_time1='"

As that is what generates the error shown in the log

 

         + update work.datawant a set proc_start_date_time1='  1776505260' where TRA

So the call execute result is telling the update to use the character value within quotes '  1776505260' . Which is not numeric.

 

Remove the single quotes from the call execute.

I suspect borrowing some code from an example updating a character variable.

 

If you are updating a data set where there is a unique key value to match your data on then a datastep update might be faster and would allow updating multiple variables in a single step. For this to work you would need to have the data sorted by that key variable(s), use a by statement with those variables and the base data cannot have any repeats of the key variable combinations.

 

A very brief program would look like this assuming the data is sorted prior:

data bigdata;
  update bigdata
         transactiondata
  ;
  by keyvariable;
run;

By default the values of variables from the transactiondata set would replace the values of the same named variables on the matching records in bigdata. An option on the UPDATE statement, UPDATEMODE sets whether missing values in the transaction data would replace values in the bigdata set. UPDATEMODE=MISSINGCHECK , the default, prevents missing from replacing the values, NOMISSINGCHECK would allow replace values with missing from the transaction data.

 

 

Note that with this approach any other variables in the transaction data will be added to the data so you may need to drop some variables so only the key variable(s) and the desired variables to update are present.

torvyle
Calcite | Level 5

Thank you very much, ballardw! Removal of the single quotes worked.

 

If I were to update two variables, the code is this:

data _null_;
set work.datahave end=final;
if _n_ = 1 then call execute("proc sql ;");
call execute("update work.datawant a set proc_start_date_time1="||proc_start_date_time1||" where transaction_id="||transaction_id||";");
call execute("update work.datawant a set proc_start_date_time2="||proc_start_date_time2||" where transaction_id="||transaction_id||";");
if final then call execute('quit;');
run;

 

When I tried to update the large dataset using Call Execute the SAS server stopped around 4min mark. I'll try to get this sorted and try your data step suggestion as well.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 3018 views
  • 2 likes
  • 3 in conversation