BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sudhan
Fluorite | Level 6
I have two tables on SAS "Table_A" and "Table_A_Archive", as part my ETL process "Table_A" is created on a daily basis and the data should be archived on "Table_A_Archive". When the data is archived on "Table_A_Archive" a a flag will be created/updated "DT_FLAG".

On the first day this is how the table would look like

 "Table_A"
 | ID  | Load_Date
 ------ -------------
 | 100 | 01JUN2020:12:13:56

 "Table_A_Archive"
  | ID  | Load_Date           | DT_FLAG
  ------ --------------------- ---------
  | 100 | 01JUN2020:12:13:56  | 1

On Day 2

 "Table_A"
 | ID  | Load_Date
 ------ ------------
 | 101 | 02JUN2020:12:13:56

 "Table_A_Archive"
 | ID  | Load_Date           | DT_FLAG
 ------ --------------------- ---------
 | 100 | 01JUN2020:12:13:56  | 2
 | 101 | 02JUN2020:12:13:56  | 1

The new data should be loaded with the DT_FLAG of 1 and old records DT_FLAG should be incremented by 1. Load_Date is the pivotal point. I have written a SAS code but it seems a bit messy, can someone please help me with a SAS Datastep

 

%macro Cntl_archive(table_name=,arch_table_name=);
 %GLOBAL WRK;

 %if %sysfunc(exist(&arch_table_name.))  %then %do;

 proc append base=&arch_table_name. data=&table_name. force;
 run;

 proc sql;
 Create table TEMP as 
 Select distinct Load_Date,Load_Date as WRK from &arch_table_name.
 order by Load_Date desc
 ;quit;

 proc rank data=TEMP descending out=TEMP;
 var WRK;
 ranks count;
 run;

 data &arch_table_name. (drop=DT_FLAG);
 set    &arch_table_name.;
 run;

 proc sql;
 Create table &arch_table_name. as
 Select T0.*,T1.count as DT_FLAG from &arch_table_name. T0
 inner join TEMP T1 on T0.Load_Date=T1.Load_Date
 ;quit
 %end;
 %else %do;

data &arch_table_name.;
set &table_name.;
DT_FLAG= 1; 
IS_ACTIVE='';
run;
%end;
%mend Cntl_archive;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Your unfortunate decision to increase the existing values of the flag requires you to always rewrite the whole dataset. You should have increased the flag for the new observations instead:

data exist;
input numdata dt_flag;
datalines;
1 1
2 1
3 2
4 2
5 3
6 3
;

data new;
input numdata;
datalines;
7
8
;

data new_app;
set new;
if _n_ = 1
then do;
  set exist (keep=dt_flag) point=nobs nobs=nobs;
  dt_flag + 1;
end;
run;

proc append base=exist data=new_app;
run;

proc print data=exist noobs;
run;

Result:

numdata	dt_flag
1	1
2	1
3	2
4	2
5	3
6	3
7	4
8	4

View solution in original post

12 REPLIES 12
Patrick
Opal | Level 21

To have the DT_FLAG always set to 1 for the most current history record will of course make it easier to then select from the history table BUT this requires that you always update all the history records for the same business key whenever you add a new version. With a growing number of history records ETL performance will degrade ...and that's why I'd consider this a sub-optimal design.

 

How often do you actually need to select the latest, 2nd latest version, ... from the history table?

 

I probably wouldn't even create a DT_FLAG/Version_Number at all. The datetime column does this already for you. So just APPEND/INSERT new history records without the need to lookup or even update anything in previously loaded history records.

 

If you really need to select "often" from the history table then consider a post process which just sorts the table by business key and datetime - a job on its own which is outside of your actual ETL and though has no impact on the critical path of the end-to-end process.

Sudhan
Fluorite | Level 6
Thanks for your response. But i need the DT_FLAG>=2, as this data will be used for a report.
Patrick
Opal | Level 21

You could also maintain a change_current_ind which is set to 1 for the most current history record and set to 0 for all the others. This way you only ever need to update the most current record in the history table (changing the value from 1 to 0) before you add a new most current record. This will keep the number of rows to update constant.

 

Alternatively: If you sort the history table as proposed then a simple data step if last/first logic will allow to exclude the most current record.

...or you can also use a SQL select * from <history table> group by <business key> having max(load_dttm) ne load_dttm to exclude the most current record from reporting - and eventually adding an index on the business key will speed-up processing.

ballardw
Super User

@Sudhan wrote:
Thanks for your response. But i need the DT_FLAG>=2, as this data will be used for a report.

Can you show exactly how the DT_Flag variable is used in your report?

 

You might be surprised what could be done with your datetime variable without changing values of a variable.

Or add the flag just before preparing the report. I find the idea of permanent data sets with values changing to be worrisome.

yabwon
Onyx | Level 15

Hi,

 

how about the following process:

data Table_A;
infile cards dlm = "|";
input ID Load_Date : datetime21.;
format Load_Date datetime.;
cards;
101 | 02JUN2020:12:13:56
;
run;

data Table_A_Archive;
infile cards dlm = "|";
input ID Load_Date : datetime21. DT_FLAG;
format Load_Date datetime.;
cards;
100 | 01JUN2020:12:13:56 | 1
;
run;
title "Before";
proc print;
run;


/*
the process:
*/

/* create viev */
data Table_A_TMP / view = Table_A_TMP;
  set Table_A;
  DT_FLAG = 0; /* make it 0 flag */
run; 

/* append the data to the history table */
proc append base = Table_A_Archive data = Table_A_TMP;
run;

/* update the flag */
data Table_A_Archive;
  modify Table_A_Archive;
  DT_FLAG = DT_FLAG + 1;
run;
title "After";
proc print;
run;

proc delete data = Table_A_TMP (memtype=view);
run;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Patrick
Opal | Level 21

@yabwon 

The OP hasn't provided a business key in the sample data - but there must be one. How would your code work if you need to "count-up" by business key and not over all rows?

yabwon
Onyx | Level 15

Hi @Patrick,

 

from what I understood from OP: >>"Table_A" is created on a daily basis and the data should be archived on "Table_A_Archive". When the data is archived on "Table_A_Archive" a a flag will be created/updated "DT_FLAG<< there is no word about a business key in the data. Just another chunk of data added at the end of the archive and flag updated.

 

But assuming that the ID is business key which informs that flag should be increased in old records when the ID is in the latest data, you could try something like this:

data Table_A;
infile cards dlm = "|";
input ID Load_Date : datetime21.;
format Load_Date datetime.;
cards;
101 | 02JUN2020:12:13:56
104 | 02JUN2020:12:13:56
;
run;

data Table_A_Archive;
infile cards dlm = "|";
input ID Load_Date : datetime21. DT_FLAG;
format Load_Date datetime.;
cards;
100 | 01JUN2020:12:13:56 | 1
101 | 01JUN2020:12:13:56 | 1
102 | 01JUN2020:12:13:56 | 1
;
run;
title "Before";
proc print;
run;


/*
the process:
*/

/* create viev */
data Table_A_TMP / view = Table_A_TMP;
  set Table_A;
  DT_FLAG = 0; /* make it 0 flag */
run; 

/* append the data to the history table */
proc append base = Table_A_Archive data = Table_A_TMP;
run;

/* update the flag */
data Table_A_Archive;

  if _N_ = 1 then
    do;
      declare hash H(dataset:"Table_A(keep = id)");
      H.defineKey("id");
      H.defineDone();
    end;

  modify Table_A_Archive;
  if H.check() = 0 then DT_FLAG = DT_FLAG + 1;
run;
title "After";
proc print;
run;

proc delete data = Table_A_TMP (memtype=view);
run;

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



smantha
Lapis Lazuli | Level 10
data data_archive;
done=0;
do until (done);
set data_archive end=done;
DT_FLAG = DT_FLAG+1;
end;
done=0;
do until (done);
set data_A end=done;
DT_FLAG = 1;
end;
stop;
run;

As the table size increases the above code becomes less and less performant. If the table is just a record keeper of load dates then it is not an issue.


Kurt_Bremser
Super User

Your unfortunate decision to increase the existing values of the flag requires you to always rewrite the whole dataset. You should have increased the flag for the new observations instead:

data exist;
input numdata dt_flag;
datalines;
1 1
2 1
3 2
4 2
5 3
6 3
;

data new;
input numdata;
datalines;
7
8
;

data new_app;
set new;
if _n_ = 1
then do;
  set exist (keep=dt_flag) point=nobs nobs=nobs;
  dt_flag + 1;
end;
run;

proc append base=exist data=new_app;
run;

proc print data=exist noobs;
run;

Result:

numdata	dt_flag
1	1
2	1
3	2
4	2
5	3
6	3
7	4
8	4
yabwon
Onyx | Level 15

Kurt, 

Two thoughts if I may:

1) Wouldn't the MODIFY statement do an update "in place" without rewriting dataset?

2) With increasing flag value for new data you will always have to find current maximum value to get the latest data instead just "where flag = 1".

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Kurt_Bremser
Super User

@yabwon wrote:

Kurt, 

Two thoughts if I may:

1) Wouldn't the MODIFY statement do an update "in place" without rewriting dataset?

2) With increasing flag value for new data you will always have to find current maximum value to get the latest data instead just "where flag = 1".

 

All the best

Bart


If you manipulate the flag variable in the existing data, you will always have to read/write the whole dataset. The increasing flag for new data means that you only have to read the last observation of the existing dataset, and increment that. So my code

  • reads the last observation of existing; will be very fast
  • reads/writes the new data to set the flag
  • reads/writes the new data during append

I don't know if you can use modify to do a pure append, where the existing data is not at least read as a whole.

s_lassen
Meteorite | Level 14

I think it can be done in one data step like this:

data Table_A_Archive;
  modify Table_A_Archive end=done;
  DT_FLAG+1;
  replace;
if done;
DT_FLAG=1; do until(done2); set Table_A end=done2; output; end; run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1724 views
  • 1 like
  • 7 in conversation