Solved: Update statement taking too much time

Mushy

Hello,

I am updating huge table using update statement in proc sql.

Its taking very long time.

I need to just blank/null few char/num fields in the tables based of the dataset that provides the fields to be blanked.

Is there an efficient way to deal with this?

Thanks,

Mushy

SASJedi

This summarizes the results of the original code and responses received as of this date when executed against this table:

Data Set Name	LIBA.HAVE	Observations	500000
Member Type	DATA	Variables	50
Engine	V9	Indexes	1
Created	04/29/2024 09:02:45	Observation Length	400
Last Modified	04/29/2024 09:02:45	Deleted Observations	0
Protection		Compressed	NO
Data Set Type		Sorted	NO
Label
Data Representation	WINDOWS_64
Encoding	utf-8 Unicode (UTF-8)

Original Code:

proc sql;
	update liba.have set country1='', country3='', pop1=., pop2=., pop1_date=.;
quit;

Results:

NOTE: PROCEDURE SQL used (Total process time):
      real time           0.61 seconds
      cpu time            0.23 seconds

So, .61 seconds was the time to beat!

@Tom's solution

data liba.have(index=(ID));
  set liba.have;
  call missing(of country1 country3 pop1 pop2 pop1_date);
run;

and @Ksharp's solution

proc sql;
alter table have
drop country1, country3,pop1,pop2,pop1_date
add country1 char(20), country3 char(3),pop1 num,pop2 num,pop1_date num;
quit;

tied for first place

      real time           0.40 seconds
      cpu time            0.20 seconds

With my solution:

data liba.have ;
   modify liba.have;
   call missing(country1, country3,pop1,pop2,pop1_date);
run;

coming in third,

      real time           0.44 seconds
      cpu time            0.32 seconds

and data_null__'s solution:

data liba.have (index=(ID));
	if 0 then set liba.have; 
	set liba.have (drop=country1 country3 pop1 pop2 pop1_date);
run;

still beating the original post:

      real time           0.58 seconds
      cpu time            0.26 seconds

Of course, your mileage may vary depending on the size of your data.

Check out my Jedi SAS Tricks for SAS Users

View solution in original post

SASJedi

To enable others to help more effectively, please share your SQL code, the log, and a sample of the table data (or at least a good description of the columns you are working with and the number of rows involved). Is your data in base SAS or in some external database?

Check out my Jedi SAS Tricks for SAS Users

Mushy

@SASJedi

I am working with SAS datasets:

In general for my case do you know a better and efficient way to update columns other than proc sql update statement?

The query I am using is:

Proc sql;

update liba.geography set country1='', country3='', pop1=., pop2=., pop1_date=.;

quit;

it is a table with 10m + records and 40+ fields.

real time 11:49.60
user cpu time 11:25.46

Tom

Why do you need to update an existing dataset?

In general with SAS datasets I never use SQL UPDATE. Instead I just create a program to MAKE the new dataset I need.

Your example code:

Proc sql;
update liba.geography set country1='', country3='', pop1=., pop2=., pop1_date=.;
quit;

I would normally run as :

data liba.geography;
  set liba.geography;
  call missing(of country1 country3 pop1 pop2 pop1_date);
run;

Mushy

@Tom Indeed its faster , but we lose index after the set.

Tom

@Mushy wrote:

@Tom Indeed its faster , but we lose index after the set.

And why does the index matter?

If it is important then you can simple create the index also, easiest is to just include its definition in the DATA statement.

data new(index=(.....));
  set old;
  ...
run;

Or if speed is an issue it is frequently much faster to make the dataset first and then create the index later. You can use PROC DATASETS or PROC SQL to create an index on an existing dataset.

But I am still curious what application is that requires you to update an existing dataset.

SASJedi

You could use a DATA step with a MODIFY statement:

data liba.geography ;
	modify liba.geography; 
	call missing(country1, country3,pop1,pop2,pop1_date);
run;

That should run faster for you. If you are nulling values for variables involved in the index, your index isn't going to be worth much after the update anyway. In that case, @Tom's advice is best all-around: re-create the data set, do any additional data manipulation, and then rebuild the index in a separate step.

Check out my Jedi SAS Tricks for SAS Users

data_null__

You could skip reassignment by dropping

data class(index=(name/unique));
   if 0 then set sashelp.class;
   set sashelp.class(drop=age weight);
   run;

@Mushy wrote:

Hello,

I am updating huge table using update statement in proc sql.

Its taking very long time.

I need to just blank/null few char/num fields in the tables based of the dataset that provides the fields to be blanked.

Is there an efficient way to deal with this?

Thanks,

Mushy

Ksharp

You could drop these variables firstly, after that adding these variables in this table .

Here is an example:

data have;
 set sashelp.class;
run;


proc sql;
alter table have
drop name,age
add name char(20),age num;
quit;

SASJedi

This summarizes the results of the original code and responses received as of this date when executed against this table:

Data Set Name	LIBA.HAVE	Observations	500000
Member Type	DATA	Variables	50
Engine	V9	Indexes	1
Created	04/29/2024 09:02:45	Observation Length	400
Last Modified	04/29/2024 09:02:45	Deleted Observations	0
Protection		Compressed	NO
Data Set Type		Sorted	NO
Label
Data Representation	WINDOWS_64
Encoding	utf-8 Unicode (UTF-8)

Original Code:

proc sql;
	update liba.have set country1='', country3='', pop1=., pop2=., pop1_date=.;
quit;

Results:

NOTE: PROCEDURE SQL used (Total process time):
      real time           0.61 seconds
      cpu time            0.23 seconds

So, .61 seconds was the time to beat!

@Tom's solution

data liba.have(index=(ID));
  set liba.have;
  call missing(of country1 country3 pop1 pop2 pop1_date);
run;

and @Ksharp's solution

proc sql;
alter table have
drop country1, country3,pop1,pop2,pop1_date
add country1 char(20), country3 char(3),pop1 num,pop2 num,pop1_date num;
quit;

tied for first place

      real time           0.40 seconds
      cpu time            0.20 seconds

With my solution:

data liba.have ;
   modify liba.have;
   call missing(country1, country3,pop1,pop2,pop1_date);
run;

coming in third,

      real time           0.44 seconds
      cpu time            0.32 seconds

and data_null__'s solution:

data liba.have (index=(ID));
	if 0 then set liba.have; 
	set liba.have (drop=country1 country3 pop1 pop2 pop1_date);
run;

still beating the original post:

      real time           0.58 seconds
      cpu time            0.26 seconds

Of course, your mileage may vary depending on the size of your data.

Check out my Jedi SAS Tricks for SAS Users

Mushy

@SASJedi Thanks for the comparision and helping out with the efficient way to update. Indeed modify satatement is the better option.

Thanks

Mushy

Update statement taking too much time

Re: Update statement taking too much time

Re: Update statement taking too much time

Re: Update statement taking too much time

Re: Update statement taking too much time

Re: Update statement taking too much time

Re: Update statement taking too much time

Re: Update statement taking too much time

Re: Update statement taking too much time

Re: Update statement taking too much time

Re: Update statement taking too much time

Re: Update statement taking too much time

Classroom Training Available!