BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ssomak
Fluorite | Level 6

 Hi all,

 

Let me first describe my problem.

  • I have a set of tables (let's say there are 40 tables), each table has more than 2 millions different customers.
  • I have set of columns that have to be updated in every table. For example, in every table I have to update PESEL column in the following way: '111_customer_id'
  • I have a list of customers to be updated (a lot of customers)

I created my program in that way, that for every customer I am creating update rule and I run this rule. So for every table I run something like this:

update customer set pesel='111_&ID.' where ID=101;
update customer set pesel='111_&ID.' where ID=102;
/* and so on */

But this solution takes a lot of time - and it is simply unacceptable.

 

Then I tried to change my program to use hash tables instead of simple update. But I also have problems: because I have to make separate rule for every client (there is client's ID in PESEL rule), I have to make temporary table and transpose it for every customer. Also, I cannot use hash tables for numeric columns (like birth_date), because after transposing every colum is character (so I have an error that types does not match).

 

So my question is: is there any 'tricky' solution to figure this problem out? Maybe I should use hash tables, but in different way? Or make some parallelism? Do you have any ideas?

 

I will be grateful for any help!

 

Cheers!

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

I think @RW9 is moving in the right direction on this, suggesting a DATA step.  Look at your current process:

 

Locate data for ID 101.  Update it.

Locate data for ID 102.  Update it.

 

The "locate" process is consuming resources.  You would be well-advised to try a simple solution first, such as:

 

data want;

set have;

if id in (101, 102) then PESEL = catx('_', '111', ID);

run;

 

That way, the program can read the data sequentially (= faster), and limit the "locate" step to finding each observation just once.

View solution in original post

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

The code you provided is invalid:

update customer set pesel='111_&ID.' where ID=101;

This will set pesel to the text 111_&id. as you use single quotes.  Macro variables only get resolved in double quotes.  It also sets that string for every ID regardless of the where, as it is exactly the same string in the second given line, so effectively the code you present =

update customer set pesel='111_&ID.' where ID in (101,102);

 

Are you doing this once or repeatedly?   I would imagine that datastep may be more efficient to do this than a series of update calls - unless this is running on a database.  

A good idea is to post some simple test data in the form of a datastep and what you want to see out at the end.  2mil records is a lot, so it will take a bit of time, but its not that vast.

 

ssomak
Fluorite | Level 6

 Hi again,

 

Unfortunatelly I have another problem. As I wrote, I used solution proposed by @Astounding, so I re-write my data set in the following way:

data &dataset.;
	set &dataset.;
	if id in (&IDs.) then do;
		&update_rules.; /* here are my update rules, for example: PESEL='11111'*/
	end;
run;

But the problem is that I re-write 'old' datasets, and some columns have user-defined formats. Because of this, I got the following error:

ERROR: The format XYZ was not found or could not be loaded

So here comes my question:

Do you know if there is some SAS option to leave column as it is when format could not be loaded? What can you suggest in such situation?

 

 Again, many thanks for any hints! 🙂

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Use:

options nofmterr;

Before the code in quesiton, and then:

options fmterr;

Afterwards.

ssomak
Fluorite | Level 6

@RW9 Cool, thank you! 🙂

Astounding
PROC Star

I think @RW9 is moving in the right direction on this, suggesting a DATA step.  Look at your current process:

 

Locate data for ID 101.  Update it.

Locate data for ID 102.  Update it.

 

The "locate" process is consuming resources.  You would be well-advised to try a simple solution first, such as:

 

data want;

set have;

if id in (101, 102) then PESEL = catx('_', '111', ID);

run;

 

That way, the program can read the data sequentially (= faster), and limit the "locate" step to finding each observation just once.

ssomak
Fluorite | Level 6

 Dear all,

 

Many thanks for all of your replies and hints. After all, the best solution for me was to rewrite dataset and apply update rules, like @Astounding suggested. In my case it was incomparably faster then update in proc sql.

 

Once again, thanks for your help!

 

 Cheers!

ballardw
Super User

With a data step approach you might need to explicitly set a longer LENGTH for your customer values as you may potentially exceed the current length of the variables.

 

Of course it might drastically help to actually show value the macro variable ID contains, the current value of PeSEL and the actual desired resulting value.

 

 

ChrisNZ
Tourmaline | Level 20

Have a look at this thread.

https://communities.sas.com/t5/Base-SAS-Programming/Alternative-to-SAS-Merge/td-p/304046/highlight/t...

 

MODIFY  plus KEY= may well be what you need.

 

Other methods are explored too. I'll be adding a chapter on table joins in the third edition of my book. 🙂

Patrick
Opal | Level 21

@ssomak

I believe you need to give us a bit more information to give you a suitable answer.

- Is this a one off or ongoing process?

- Is this data in SAS tables or in a database?

- How do these rules look like?

  - Please post some sample data in the form of a working data step. A table with your data to be changed and a table with the rules and customers to be changed.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 665 views
  • 0 likes
  • 6 in conversation