BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Mushy
Quartz | Level 8

Hello,

 

I am updating huge table using update statement in proc sql.

Its taking very long time.

I need to just blank/null few char/num fields in the tables based of the dataset that provides the fields to be blanked.

 Is there an efficient way to deal with this?

 

Thanks,

Mushy

1 ACCEPTED SOLUTION

Accepted Solutions
SASJedi
SAS Super FREQ

This summarizes the results of the original code and responses received as of this date when executed against this table:

Data Set Name LIBA.HAVE Observations 500000
Member Type DATA Variables 50
Engine V9 Indexes 1
Created 04/29/2024 09:02:45 Observation Length 400
Last Modified 04/29/2024 09:02:45 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding utf-8 Unicode (UTF-8)    

Original Code:

proc sql;
	update liba.have set country1='', country3='', pop1=., pop2=., pop1_date=.;
quit;

Results:

NOTE: PROCEDURE SQL used (Total process time):
      real time           0.61 seconds
      cpu time            0.23 seconds

So, .61 seconds was the time to beat!

@Tom's solution

data liba.have(index=(ID));
  set liba.have;
  call missing(of country1 country3 pop1 pop2 pop1_date);
run;

and @Ksharp's solution

proc sql;
alter table have
drop country1, country3,pop1,pop2,pop1_date
add country1 char(20), country3 char(3),pop1 num,pop2 num,pop1_date num;
quit;

tied for first place 

      real time           0.40 seconds
      cpu time            0.20 seconds

With my solution:

data liba.have ;
   modify liba.have;
   call missing(country1, country3,pop1,pop2,pop1_date);
run;

coming in third,

      real time           0.44 seconds
      cpu time            0.32 seconds

and data_null__'s solution:

data liba.have (index=(ID));
	if 0 then set liba.have; 
	set liba.have (drop=country1 country3 pop1 pop2 pop1_date);
run;

still beating the original post:

      real time           0.58 seconds
      cpu time            0.26 seconds


Of course, your mileage may vary depending on the size of your data. 

 

 

Check out my Jedi SAS Tricks for SAS Users

View solution in original post

10 REPLIES 10
SASJedi
SAS Super FREQ

To enable others to help more effectively, please share your SQL code, the log, and a sample of the table data (or at least a good description of the columns you are working with and the number of rows involved).  Is your data in base SAS or in some external database?

Check out my Jedi SAS Tricks for SAS Users
Mushy
Quartz | Level 8

@SASJedi 

 

I am working with SAS datasets:

In general for my case do you know a better and efficient way to update columns other than proc sql update statement?

 

The query I am using is:

Proc sql;

update liba.geography set country1='', country3='', pop1=., pop2=., pop1_date=.;

quit;

it is a table with 10m + records and 40+ fields.

 

real time 11:49.60
user cpu time 11:25.46

 

Tom
Super User Tom
Super User

Why do you need to update an existing dataset?

 

In general with SAS datasets I never use SQL UPDATE.  Instead I just create a program to MAKE the new dataset I need.

 

Your example code:

Proc sql;
update liba.geography set country1='', country3='', pop1=., pop2=., pop1_date=.;
quit;

I would normally run as :

data liba.geography;
  set liba.geography;
  call missing(of country1 country3 pop1 pop2 pop1_date);
run;

 

 

Mushy
Quartz | Level 8

@Tom  Indeed its faster , but we lose index after the set.

Tom
Super User Tom
Super User

@Mushy wrote:

@Tom  Indeed its faster , but we lose index after the set.


And why does the index matter?

If it is important then you can simple create the index also, easiest is to just include its definition in the DATA statement.

data new(index=(.....));
  set old;
  ...
run;

Or if speed is an issue it is frequently much faster to make the dataset first and then create the index later. You can use PROC DATASETS or PROC SQL to create an index on an existing dataset.

 

But I am still curious what application is that requires you to update an existing dataset.

SASJedi
SAS Super FREQ

You could use a DATA step with a MODIFY statement:

data liba.geography ;
	modify liba.geography; 
	call missing(country1, country3,pop1,pop2,pop1_date);
run;

That should run faster for you. If you are nulling values for variables involved in the index, your index isn't going to be worth much after the update anyway. In that case, @Tom's advice is best all-around: re-create the data set, do any additional data manipulation, and then rebuild the index in a separate step. 

Check out my Jedi SAS Tricks for SAS Users
data_null__
Jade | Level 19

You could skip reassignment by dropping  

 

data class(index=(name/unique));
   if 0 then set sashelp.class;
   set sashelp.class(drop=age weight);
   run;

Capture.PNG

 

 


@Mushy wrote:

Hello,

 

I am updating huge table using update statement in proc sql.

Its taking very long time.

I need to just blank/null few char/num fields in the tables based of the dataset that provides the fields to be blanked.

 Is there an efficient way to deal with this?

 

Thanks,

Mushy


 

Ksharp
Super User

You could drop these variables firstly, after that adding these variables in this table .

Here is an example:

 

data have;
 set sashelp.class;
run;


proc sql;
alter table have
drop name,age
add name char(20),age num;
quit;
SASJedi
SAS Super FREQ

This summarizes the results of the original code and responses received as of this date when executed against this table:

Data Set Name LIBA.HAVE Observations 500000
Member Type DATA Variables 50
Engine V9 Indexes 1
Created 04/29/2024 09:02:45 Observation Length 400
Last Modified 04/29/2024 09:02:45 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding utf-8 Unicode (UTF-8)    

Original Code:

proc sql;
	update liba.have set country1='', country3='', pop1=., pop2=., pop1_date=.;
quit;

Results:

NOTE: PROCEDURE SQL used (Total process time):
      real time           0.61 seconds
      cpu time            0.23 seconds

So, .61 seconds was the time to beat!

@Tom's solution

data liba.have(index=(ID));
  set liba.have;
  call missing(of country1 country3 pop1 pop2 pop1_date);
run;

and @Ksharp's solution

proc sql;
alter table have
drop country1, country3,pop1,pop2,pop1_date
add country1 char(20), country3 char(3),pop1 num,pop2 num,pop1_date num;
quit;

tied for first place 

      real time           0.40 seconds
      cpu time            0.20 seconds

With my solution:

data liba.have ;
   modify liba.have;
   call missing(country1, country3,pop1,pop2,pop1_date);
run;

coming in third,

      real time           0.44 seconds
      cpu time            0.32 seconds

and data_null__'s solution:

data liba.have (index=(ID));
	if 0 then set liba.have; 
	set liba.have (drop=country1 country3 pop1 pop2 pop1_date);
run;

still beating the original post:

      real time           0.58 seconds
      cpu time            0.26 seconds


Of course, your mileage may vary depending on the size of your data. 

 

 

Check out my Jedi SAS Tricks for SAS Users
Mushy
Quartz | Level 8

@SASJedi  Thanks for the comparision and helping out with the efficient way to update. Indeed modify satatement is the better option.

Thanks

Mushy

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 454 views
  • 20 likes
  • 5 in conversation