SAS Enterprise Guide

gnyanendra · Posted 01-11-2012 06:29 AM

Hi,

(a). I'm facing difficulties with UPDATING on duplicate records in a table. Can someone help me on syntax?

(b). Also, we have some 10-12 fact tables and equal number of dimension tables in star schema and need to process over a million of records across ~30 tables on daily basis. Can anyone suggest if EG is the best tool for performaing this kind of task?

thanks a lot!

Gnyanendra

LarryWorley · Posted 01-12-2012 09:49 PM

I have a couple of techniques which may work for you.

The first method assumes you can always update the complete record in your secondary table. So it deletes the records to be updated and then inserts the new record. In some cases this may be optimal.

The second method uses Proc SQL update functionality and will give you the flexibility to update only a subset of columns if you wish. The code that I present below is more complicated than what might be needed with other versions of SQL. Some versions will allow you to write the update referencing the source table only once, but I found that I had to reference the source table once for each column being updated plus again for the update overall. I might be missing something here but I tried several variations before getting this to work.

Note that if you can update the complete record, then you may want to benchmark both methods to see which performs best. Performance can be dependent upon the both SAS optimization and DB optimization.

Let me know if this helps.

Setup code: This section of code sets up some tables to mimic your tables.

PROC SQL;

* code to simulate primary1 and secondary1 tables

------------------------------------------------;

create table work.primary1 (id char(4) primary key

,name char(1)

,code char(4)

,status char(1)

) ;

insert into work.primary1

values('1639','F','TA1','1')

values('1065','M','ME3','1')

values('1400','M','ME1','1')

values('1561','M',null,'1')

values('1221','F','FA3','1')

;

create table work.secondary1 as

select *

from work.primary1

;

quit ;

First Method: Here I show the first method which is to delete records to be updated and then

insert the new records.

* two step process which deletes records from target table with

keys for records with updated status.

then insert changed records into table

-----------------------------------------;

proc sql ;

* simulate a change in primary table ;

update work.primary1

set code = 'abc'

,status = '0'

where id = '1561'

;

delete *

from work.secondary1

where id in (select id

from work.primary1

where status = '0'

)

;

insert into work.secondary1

select id

,name

,code

,status

from work.primary1

where status = '0'

;

quit ;

Second Method: Here, I use explicit updating.

* one step process for updating

-----------------------------;

proc sql ;

* simulate a change in primary table ;

update work.primary1

set code = 'xyz'

,status = '0'

where id = '1561'

;

* This is the real update here. **************************;

update work.secondary1 t1

set name = ( select name

from work.primary1 t2

where t2.status = '0'

)

,code = ( select code

from work.primary1 t3

where t1.id = t3.id

and t3.status = '0'

)

,status = ( select status

from work.primary1 t4

where t1.id = t4.id

and t4.status = '0'

)

where t1.id = (select id

from work.primary1

where status = '0'

) ;

QUIT;

View solution in original post

Doc_Duke · Posted 01-11-2012 09:17 AM

I don't know about (a).

For (b), Enterprise Guide generates SQL code that it passes to the SAS Server for execution. So, if your comparison for 'bestness' is regular SAS, they are equal. The SAS Server does some optimization of code for each database that it supports.

Within EGuide, you can also write code nodes with SQL pass-thru, in which you pass exactly the code that you want to the SQL database. Technically, that is the "best", but it puts the entire onus on you to write efficient code.

Doc Muhlbaier

Duke

LarryWorley · Posted 01-11-2012 10:47 AM

I am confused with regards to (a); your title says help with UPDATING unique key, but text referes to difficulties with UPDATING with duplicate records, which implies non-unique keys. Can you clarify?

I can make some guesses about issue but would be best if you can provide samples of sql and error message? Also can you provide proc contents on the table you are trying to update? Also some sample records from the UPDATE input would be useful.

Building on Doc's response to (b), I would start with EG. Then if it is not efficient enough from processing standpoint, then look at modifying the EG generated code to make more efficient. EG generates generic code based upon your directions and there is always the possibility that with knowledge of your database and the data, you can optimize better than it can in a specific solution. This step could include the pass-thru code to which Doc refers.

Keep in mind that as you perform additional optimizations, that you are generally making your code more complex and less maintainable as well as using coder time to gain processing efficiencies. The best solution balances machine effeciency against those other concerns.

My approach is to start simple -- use easiest tool (EG for example) -- and then only complicate enough to get needed efficiency.

gnyanendra · Posted 01-12-2012 02:01 AM

Hi Larry,

My mistake in notwriting it clearly.

in (a), I meant I need to UPDATE older record with new one if KEY is same.

below text should help you get the problem better.

thanks!

----------------------

Hereis the SQL code I’m using:

PROC SQL;

INSERT INTO BETA.secondary1 (a.id, name, code, status)

SELECT

t1.a.id,

t1.name,

t1.code,

t1.status

FROM BETA.primary1 t1

WHERE status = ‘1’;

QUIT;

I now want toUPDATE table BETA.secondary1 again as the status changes to ‘0’ for which I’mtrying UPDATE clause instead INSERT with different combinations of SELECT

Like

UPDATE secondary1 (a.id, name, code, status)

SELECT

t1.a.id,

t1.name,

t1.code,

t1.status

FROM BETA.primary1 t1

WHERE status = ‘0’;

This is a simple query, I have lot more and complex which Ican’t post but I think you get the idea.

Here is the error log

1 ;*';*";*/;quit;run;

2 OPTIONS PAGENO=MIN;

3 %LET _CLIENTTASKLABEL='Program';

4 %LET _CLIENTPROJECTPATH='';

5 %LET _CLIENTPROJECTNAME='';

6 %LET _SASPROGRAMFILE=;

7

8 ODS _ALL_ CLOSE;

9 OPTIONS DEV=ACTIVEX;

10 GOPTIONS XPIXELS=0 YPIXELS=0;

11 FILENAME EGSR TEMP;

12 ODS tagsets.sasreport12(ID=EGSR)FILE=EGSR STYLE=Analysis

12 !STYLESHEET=(URL="file:///C:/sas93/software/x86/SASEnterpriseGuide/4.3/Styles/Analysis.css")

12 ! NOGTITLE NOGFOOTNOTE GPATH=&sasworklocation ENCODING=UTF8options(rolap="on");

NOTE: WritingTAGSETS.SASREPORT12(EGSR) Body file: EGSR

13

14 GOPTIONS ACCESSIBLE;

15 PROC SQL;

16 UPDATE BETA.secondary1 (a.id, name,code, status)

_

79

76

ERROR 79-322: Expecting a ).

ERROR 76-322: Syntax error, statementwill be ignored.

17 SELECT

18 t1.a.id,

19 t1.name,

20 t1.code,

21 t1.status

22

23

24 FROMBETA.primary1 t1

25 WHEREstatus = ‘1’;

NOTE: PROC SQL set option NOEXECand will continue to check the syntax of statements.

26 QUIT;

NOTE: The SAS System stoppedprocessing this step because of errors.

NOTE: PROCEDURE SQL used (Totalprocess time):

real time 0.00 seconds

cpu time 0.00 seconds

27

28 GOPTIONS NOACCESSIBLE;

29 %LET _CLIENTTASKLABEL=;

30 %LET _CLIENTPROJECTPATH=;

31 %LET _CLIENTPROJECTNAME=;

32 %LET _SASPROGRAMFILE=;

33

34 ;*';*";*/;quit;run;

2 TheSAS System 06:41 Thursday,January 12, 2012

35 ODS _ALL_ CLOSE;

36

37

38 QUIT; RUN;

39

LarryWorley · Posted 01-12-2012 09:49 PM

I have a couple of techniques which may work for you.

The first method assumes you can always update the complete record in your secondary table. So it deletes the records to be updated and then inserts the new record. In some cases this may be optimal.

The second method uses Proc SQL update functionality and will give you the flexibility to update only a subset of columns if you wish. The code that I present below is more complicated than what might be needed with other versions of SQL. Some versions will allow you to write the update referencing the source table only once, but I found that I had to reference the source table once for each column being updated plus again for the update overall. I might be missing something here but I tried several variations before getting this to work.

Note that if you can update the complete record, then you may want to benchmark both methods to see which performs best. Performance can be dependent upon the both SAS optimization and DB optimization.

Let me know if this helps.

Setup code: This section of code sets up some tables to mimic your tables.

PROC SQL;

* code to simulate primary1 and secondary1 tables

------------------------------------------------;

create table work.primary1 (id char(4) primary key

,name char(1)

,code char(4)

,status char(1)

) ;

insert into work.primary1

values('1639','F','TA1','1')

values('1065','M','ME3','1')

values('1400','M','ME1','1')

values('1561','M',null,'1')

values('1221','F','FA3','1')

;

create table work.secondary1 as

select *

from work.primary1

;

quit ;

First Method: Here I show the first method which is to delete records to be updated and then

insert the new records.

* two step process which deletes records from target table with

keys for records with updated status.

then insert changed records into table

-----------------------------------------;

proc sql ;

* simulate a change in primary table ;

update work.primary1

set code = 'abc'

,status = '0'

where id = '1561'

;

delete *

from work.secondary1

where id in (select id

from work.primary1

where status = '0'

)

;

insert into work.secondary1

select id

,name

,code

,status

from work.primary1

where status = '0'

;

quit ;

Second Method: Here, I use explicit updating.

* one step process for updating

-----------------------------;

proc sql ;

* simulate a change in primary table ;

update work.primary1

set code = 'xyz'

,status = '0'

where id = '1561'

;

* This is the real update here. **************************;

update work.secondary1 t1

set name = ( select name

from work.primary1 t2

where t2.status = '0'

)

,code = ( select code

from work.primary1 t3

where t1.id = t3.id

and t3.status = '0'

)

,status = ( select status

from work.primary1 t4

where t1.id = t4.id

and t4.status = '0'

)

where t1.id = (select id

from work.primary1

where status = '0'

) ;

QUIT;

gnyanendra · Posted 01-13-2012 01:15 AM

Hi Larry,

I think First Method should work for me. I just tested it with dummy data and it worked fine, however I will do it on original database and test on all complex queries, data consistensies and all.

thanks you so much for the help.

Just to mention, we tried with secoind method but it involved writing query for each of the value column which didn't fit our requirement and was too tedious.

Anyways, I will ask any further question if doubt arises.

thanks

Gnyanendra

SAS Enterprise Guide

How to UPDATE on unique key in MySQL using SAS EG

Re: How to UPDATE on unique key in MySQL using SAS EG

How to UPDATE on unique key in MySQL using SAS EG

How to UPDATE on unique key in MySQL using SAS EG

How to UPDATE on unique key in MySQL using SAS EG

Re: How to UPDATE on unique key in MySQL using SAS EG

Re: How to UPDATE on unique key in MySQL using SAS EG

How SAS Viya Platform Updates Affect CAS Servers

Updating the SAS Deployment Operator

SAS Viya 3.5: CAS CRUD (Table Update) Capabilities

Customer Interactions - Using SAS Viya to deliver improved customer in...

updating keys SAS 9.2 for Windows

Follow Us

What is...

SAS Enterprise Guide

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...