- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
(a). I'm facing difficulties with UPDATING on duplicate records in a table. Can someone help me on syntax?
(b). Also, we have some 10-12 fact tables and equal number of dimension tables in star schema and need to process over a million of records across ~30 tables on daily basis. Can anyone suggest if EG is the best tool for performaing this kind of task?
thanks a lot!
Gnyanendra
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a couple of techniques which may work for you.
The first method assumes you can always update the complete record in your secondary table. So it deletes the records to be updated and then inserts the new record. In some cases this may be optimal.
The second method uses Proc SQL update functionality and will give you the flexibility to update only a subset of columns if you wish. The code that I present below is more complicated than what might be needed with other versions of SQL. Some versions will allow you to write the update referencing the source table only once, but I found that I had to reference the source table once for each column being updated plus again for the update overall. I might be missing something here but I tried several variations before getting this to work.
Note that if you can update the complete record, then you may want to benchmark both methods to see which performs best. Performance can be dependent upon the both SAS optimization and DB optimization.
Let me know if this helps.
Setup code: This section of code sets up some tables to mimic your tables.
PROC SQL;
* code to simulate primary1 and secondary1 tables
------------------------------------------------;
create table work.primary1 (id char(4) primary key
,name char(1)
,code char(4)
,status char(1)
) ;
insert into work.primary1
values('1639','F','TA1','1')
values('1065','M','ME3','1')
values('1400','M','ME1','1')
values('1561','M',null,'1')
values('1221','F','FA3','1')
;
create table work.secondary1 as
select *
from work.primary1
;
quit ;
First Method: Here I show the first method which is to delete records to be updated and then
insert the new records.
* two step process which deletes records from target table with
keys for records with updated status.
then insert changed records into table
-----------------------------------------;
proc sql ;
* simulate a change in primary table ;
update work.primary1
set code = 'abc'
,status = '0'
where id = '1561'
;
delete *
from work.secondary1
where id in (select id
from work.primary1
where status = '0'
)
;
insert into work.secondary1
select id
,name
,code
,status
from work.primary1
where status = '0'
;
quit ;
Second Method: Here, I use explicit updating.
* one step process for updating
-----------------------------;
proc sql ;
* simulate a change in primary table ;
update work.primary1
set code = 'xyz'
,status = '0'
where id = '1561'
;
* This is the real update here. **************************;
update work.secondary1 t1
set name = ( select name
from work.primary1 t2
where t2.status = '0'
)
,code = ( select code
from work.primary1 t3
where t1.id = t3.id
and t3.status = '0'
)
,status = ( select status
from work.primary1 t4
where t1.id = t4.id
and t4.status = '0'
)
where t1.id = (select id
from work.primary1
where status = '0'
) ;
QUIT;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't know about (a).
For (b), Enterprise Guide generates SQL code that it passes to the SAS Server for execution. So, if your comparison for 'bestness' is regular SAS, they are equal. The SAS Server does some optimization of code for each database that it supports.
Within EGuide, you can also write code nodes with SQL pass-thru, in which you pass exactly the code that you want to the SQL database. Technically, that is the "best", but it puts the entire onus on you to write efficient code.
Doc Muhlbaier
Duke
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am confused with regards to (a); your title says help with UPDATING unique key, but text referes to difficulties with UPDATING with duplicate records, which implies non-unique keys. Can you clarify?
I can make some guesses about issue but would be best if you can provide samples of sql and error message? Also can you provide proc contents on the table you are trying to update? Also some sample records from the UPDATE input would be useful.
Building on Doc's response to (b), I would start with EG. Then if it is not efficient enough from processing standpoint, then look at modifying the EG generated code to make more efficient. EG generates generic code based upon your directions and there is always the possibility that with knowledge of your database and the data, you can optimize better than it can in a specific solution. This step could include the pass-thru code to which Doc refers.
Keep in mind that as you perform additional optimizations, that you are generally making your code more complex and less maintainable as well as using coder time to gain processing efficiencies. The best solution balances machine effeciency against those other concerns.
My approach is to start simple -- use easiest tool (EG for example) -- and then only complicate enough to get needed efficiency.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Larry,
My mistake in notwriting it clearly.
in (a), I meant I need to UPDATE older record with new one if KEY is same.
below text should help you get the problem better.
thanks!
----------------------
Hereis the SQL code I’m using:
PROC SQL;
INSERT INTO BETA.secondary1 (a.id, name, code, status)
SELECT
t1.a.id,
t1.name,
t1.code,
t1.status
FROM BETA.primary1 t1
WHERE status = ‘1’;
QUIT;
I now want toUPDATE table BETA.secondary1 again as the status changes to ‘0’ for which I’mtrying UPDATE clause instead INSERT with different combinations of SELECT
Like
UPDATE secondary1 (a.id, name, code, status)
SELECT
t1.a.id,
t1.name,
t1.code,
t1.status
FROM BETA.primary1 t1
WHERE status = ‘0’;
This is a simple query, I have lot more and complex which Ican’t post but I think you get the idea.
Here is the error log
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Program';
4 %LET _CLIENTPROJECTPATH='';
5 %LET _CLIENTPROJECTNAME='';
6 %LET _SASPROGRAMFILE=;
7
8 ODS _ALL_ CLOSE;
9 OPTIONS DEV=ACTIVEX;
10 GOPTIONS XPIXELS=0 YPIXELS=0;
11 FILENAME EGSR TEMP;
12 ODS tagsets.sasreport12(ID=EGSR)FILE=EGSR STYLE=Analysis
12 !STYLESHEET=(URL="file:///C:/sas93/software/x86/SASEnterpriseGuide/4.3/Styles/Analysis.css")
12 ! NOGTITLE NOGFOOTNOTE GPATH=&sasworklocation ENCODING=UTF8options(rolap="on");
NOTE: WritingTAGSETS.SASREPORT12(EGSR) Body file: EGSR
13
14 GOPTIONS ACCESSIBLE;
15 PROC SQL;
16 UPDATE BETA.secondary1 (a.id, name,code, status)
_
79
76
ERROR 79-322: Expecting a ).
ERROR 76-322: Syntax error, statementwill be ignored.
17 SELECT
18 t1.a.id,
19 t1.name,
20 t1.code,
21 t1.status
22
23
24 FROMBETA.primary1 t1
25 WHEREstatus = ‘1’;
NOTE: PROC SQL set option NOEXECand will continue to check the syntax of statements.
26 QUIT;
NOTE: The SAS System stoppedprocessing this step because of errors.
NOTE: PROCEDURE SQL used (Totalprocess time):
real time 0.00 seconds
cpu time 0.00 seconds
27
28 GOPTIONS NOACCESSIBLE;
29 %LET _CLIENTTASKLABEL=;
30 %LET _CLIENTPROJECTPATH=;
31 %LET _CLIENTPROJECTNAME=;
32 %LET _SASPROGRAMFILE=;
33
34 ;*';*";*/;quit;run;
2 TheSAS System 06:41 Thursday,January 12, 2012
35 ODS _ALL_ CLOSE;
36
37
38 QUIT; RUN;
39
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a couple of techniques which may work for you.
The first method assumes you can always update the complete record in your secondary table. So it deletes the records to be updated and then inserts the new record. In some cases this may be optimal.
The second method uses Proc SQL update functionality and will give you the flexibility to update only a subset of columns if you wish. The code that I present below is more complicated than what might be needed with other versions of SQL. Some versions will allow you to write the update referencing the source table only once, but I found that I had to reference the source table once for each column being updated plus again for the update overall. I might be missing something here but I tried several variations before getting this to work.
Note that if you can update the complete record, then you may want to benchmark both methods to see which performs best. Performance can be dependent upon the both SAS optimization and DB optimization.
Let me know if this helps.
Setup code: This section of code sets up some tables to mimic your tables.
PROC SQL;
* code to simulate primary1 and secondary1 tables
------------------------------------------------;
create table work.primary1 (id char(4) primary key
,name char(1)
,code char(4)
,status char(1)
) ;
insert into work.primary1
values('1639','F','TA1','1')
values('1065','M','ME3','1')
values('1400','M','ME1','1')
values('1561','M',null,'1')
values('1221','F','FA3','1')
;
create table work.secondary1 as
select *
from work.primary1
;
quit ;
First Method: Here I show the first method which is to delete records to be updated and then
insert the new records.
* two step process which deletes records from target table with
keys for records with updated status.
then insert changed records into table
-----------------------------------------;
proc sql ;
* simulate a change in primary table ;
update work.primary1
set code = 'abc'
,status = '0'
where id = '1561'
;
delete *
from work.secondary1
where id in (select id
from work.primary1
where status = '0'
)
;
insert into work.secondary1
select id
,name
,code
,status
from work.primary1
where status = '0'
;
quit ;
Second Method: Here, I use explicit updating.
* one step process for updating
-----------------------------;
proc sql ;
* simulate a change in primary table ;
update work.primary1
set code = 'xyz'
,status = '0'
where id = '1561'
;
* This is the real update here. **************************;
update work.secondary1 t1
set name = ( select name
from work.primary1 t2
where t2.status = '0'
)
,code = ( select code
from work.primary1 t3
where t1.id = t3.id
and t3.status = '0'
)
,status = ( select status
from work.primary1 t4
where t1.id = t4.id
and t4.status = '0'
)
where t1.id = (select id
from work.primary1
where status = '0'
) ;
QUIT;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Larry,
I think First Method should work for me. I just tested it with dummy data and it worked fine, however I will do it on original database and test on all complex queries, data consistensies and all.
thanks you so much for the help.
Just to mention, we tried with secoind method but it involved writing query for each of the value column which didn't fit our requirement and was too tedious.
Anyways, I will ask any further question if doubt arises.
thanks
Gnyanendra