BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nasser_DRMCP
Lapis Lazuli | Level 10

Hello,

I have 5 tables that contain around 5 millions rows.

My program read and extract all rows  (on tuesday, on wednerday, thursday, friday and saturday)

I would like to optimize by doing a delta loading instead of full loading. I would like to extract only new rows since the last extraction and then append this new records in a target table

What is the "best" method ?

1°) should I compare source table and target tables to identify wew records ?

2°) should I save (each day) the extraction date in a track table. and then reading this date to extract news record of the source table ?

3°) should I read the max value of column "update_date" of the source table.

thanks in advance for your help / advice

kind regards

Nass

1 ACCEPTED SOLUTION

Accepted Solutions
17 REPLIES 17
ChrisNZ
Tourmaline | Level 20

Why do you need to check the date at all? Is there a chance you might load the same date twice?

 

To read the latest date, and provided new data is always APPENDed, just read the last record.

 set TABLE nobs=NOBS point=NOBS;

 

 

Nasser_DRMCP
Lapis Lazuli | Level 10

Thanks Chris for your help.

yes, by checking the date, if I have to re launch the program (in the day) , I Always extract only the new data. the new data should be inserted only once in the target even if the program is launch twice or more.
Also, I am sorru but i don't understand this "set TABLE nobs=NOBS point=NOBS;"

Nasser_DRMCP
Lapis Lazuli | Level 10

I think I understood, you advice me to get only last record by using this (for example)

data last;

/* get number of records (N) */

if 0 then

set sashelp.class nobs=nobs end=eof;

/* use POINT= to get nth record */

set sashelp.class point=nobs;

output;

stop;

run;

SASKiwi
PROC Star

You haven't really described your loading process completely. Are you doing only row additions or are you also doing row updates and deletions? Also do you want your load process to be repeatable, that is remove all rows from a particular load then load them again?

ChrisNZ
Tourmaline | Level 20

Yes exactly. My code does the same in one line.

Nasser_DRMCP
Lapis Lazuli | Level 10
Hello Chris

I tried this code your suggested --> set TABLE nobs=NOBS point=NOBS;
but I encounter an error "Statement is not valid or it is used out of proper order."
I try this as wel (to get the last of the "class" table ) -->set sashelp.class nobs=NOBS point=NOBS; same error
Kurt_Bremser
Super User

Maxim 2 says: Read the Log.

The log is the #1 resource for detecting problems in SAS code. The log, the whole log, and nothing but the log.

So please copy/paste the whole log of the step that resulted in a problem into a window opened with this button:

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

Nasser_DRMCP
Lapis Lazuli | Level 10
1                                                          The SAS System                       Monday, October 11, 2021 11:12:00 AM

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='DTM_SIMU_VDELTA';
4          %LET _CLIENTPROCESSFLOWNAME='Process Flow';
5          %LET _CLIENTPROJECTPATH='';
6          %LET _CLIENTPROJECTPATHHOST='';
7          %LET _CLIENTPROJECTNAME='';
8          %LET _SASPROGRAMFILE='/home/ldap/rollens/sbsas/ROLLENS/CRC/DTM_SIMU_VDELTA.sas';
9          %LET _SASPROGRAMFILEHOST='isasakj4.compagny.fr';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=PNG;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         FILENAME EGSR TEMP;
15         ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
16             STYLE=HTMLBlue
17             STYLESHEET=(URL="file:///D:/appli/SAS/SASHome/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css")
18             NOGTITLE
19             NOGFOOTNOTE
20             GPATH=&sasworklocation
21             ENCODING=UTF8
22             options(rolap="on")
23         ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26          set TABLE nobs=NOBS point=NOBS;
            ___
            180

ERROR 180-322: Statement is not valid or it is used out of proper order.

27         
28         
29         GOPTIONS NOACCESSIBLE;
30         %LET _CLIENTTASKLABEL=;
31         %LET _CLIENTPROCESSFLOWNAME=;
32         %LET _CLIENTPROJECTPATH=;
33         %LET _CLIENTPROJECTPATHHOST=;
34         %LET _CLIENTPROJECTNAME=;
35         %LET _SASPROGRAMFILE=;
36         %LET _SASPROGRAMFILEHOST=;
37         
38         ;*';*";*/;quit;run;
39         ODS _ALL_ CLOSE;
40         
41         
42         QUIT; RUN;
43         

Hello Kurt

I put the whole log

thanks

Nasser_DRMCP
Lapis Lazuli | Level 10

yes, but when I try this data step bellow. I encounter an error "out of memory" and the result is very big "t_last table" table.

data t_last ;

set sashelp.class nobs=NOBS point=NOBS;

run;

 

I know that this code bellow works well==>

data last ;

/* get number of records (N) */

if 0 then

set sashelp.class nobs=p_nobs end=eof;

/* use POINT= to get nth record */

set sashelp.class point=p_nobs;

output;

stop;

run;

 

Chris said "Yes exactly. My code does the same in one line." sorry, but I Don't know how.

Thanks

Kurt_Bremser
Super User

When you use SET in a data step without the POINT= option, the incoming data will be read sequentially, and as soon as the SET statement (which is also a READ under the hood) tries to read past the end, the data step will automatically terminate gracefully.

But with the POINT= option, there is no sequential read, and the SET can happily read the observation again and again and again (and write out an observation for each iteration) until the resulting dataset cracks storage limits.

This why you always need a means to control how many observations are processed, either by using SET in a do loop with a termination condition, or using the STOP statement once you've achieved your goal.

Since a STOP also prevents the implicit write, you need to add OUTPUT:

data t_last ;
set sashelp.class nobs=NOBS point=NOBS;
output;
stop;
run;

 

Nasser_DRMCP
Lapis Lazuli | Level 10

thanks Kurt for the explanations. I have no more errors.
but the last obs I get returns the variables "date_update" "and" date_load "with the values ". "
And I don't understanbd this result because by getting min and max of this 2 variables with proc sql, result is
"02JUN2014 / 09OCT2021 / 02JUN2014 / 08OCT2021"
so why the step with "point=..." returns "." ? and does not return a "09OCT2021" and "08OCT2021"

Kurt_Bremser
Super User

Since we still do not know what we are dealing with, you must now provide examples for the existing dataset, the "update" dataset, and how the final dataset from that should look like.

For the two source datasets, use data steps with datalines (do not skip this), so we can easily recreate them for developing/testing in or environment.

Nasser_DRMCP
Lapis Lazuli | Level 10

you are right. i give you a sample of data

data source ;
length project_name $ 9;
input num_client project_name $ project_amount date_load date_update ;
informat date_load  date_update ddmmyy10.;
format date_load date_update ddmmyy10.;
datalines;
280001 home 15000 01/04/2019 20/04/2019
280002 car 3000 01/05/2019 .
280003 personnal 2250 30/06/2019 .
280004 home 25000 31/07/2020 01/08/2020
280005 car 12000 21/01/2021 .
280006 personnal 1500 20/02/2021 .
280007 home 75500 12/04/2021 12/05/2021
280008 car 42000 27/05/2021 .
280009 personnal 2000 05/09/2021 10/10/2021
280010 home 2000 10/10/2021 .
;
run;

in the target table, amount of the client 28009 is 1000 (and not 2000 yet). and the client 20010 does not exist yet

I would like to identify client 28009 the be updated ans client 20010 to be inserted.

I Don't know the best way to do that.

there is no dimensional (star schema) with fact and dimension tables with keys)

this table is "alone"

thanks

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 2090 views
  • 0 likes
  • 5 in conversation