Select only updated records and load the updated records in SQL table

David_Billa · Posted 04-04-2020 08:52 AM

We've a process where clients will update the file (say sales details) when there is a need and they intimate us about the change. So our job is to load those updated records to SQL DB table by maintaing the history.

For the first time while loading the data to SQL we just did a 'proc append' to SQL table after reading the file via data step infile Statement. However this method will not work to track the history in SQL when there is a update in the file.

I'm not Aware of SCDs as well. We are not using any Primary/foreign key stuff in SQL.I would like to understand how I can tackle the situation to update the history. I tried with 'Proc SQL update' and 'If first. and last.' but it didn't helped either. Appericiate your help here.

Assume I've a file like this.

Id	Name	Sales	Transaction	Example Values	Act_Ind	Valid_From	Valid_To
1	Unit	True	6	58001	Y	19000101T000000	99991231T235959
2	Key	True	6	121216	Y	19000101T000000	99991231T235959
3	Value	True	18	820595,2	Y	19000101T000000	99991231T235959

Assume I've received an update to the file like this. If you notice Transaction value has changed for Key from 6 to 8.

Id	Name	Sales	Transaction	Example Values	Act_Ind	Valid_From	Valid_To
2	Key	True	8	121216	Y	20200404T000000	99991231T235959

So now I want data like this SQL table.

Id	Name	Sales	Transaction	Example Values	Act_Ind	Valid_From	Valid_To
1	Unit	True	6	58001	Y	19000101T000000	99991231T235959
2	Key	True	6	121216	N	19000101T000000	20200403T000000
2	Key	True	8	121216	Y	20200404T000000	99991231T235959
3	Value	True	18	820595,2	Y	19000101T000000	99991231T235959

Kurt_Bremser · Posted 04-04-2020 09:11 AM

When column attributes that affect storage (length and/or type) change, you have to rewrite the whole table. In SAS and in a DBMS. And a column will have the same length throughout the whole table, period.

I suggest you set up a change management process where changes in structure need to be approved by organization and planning, the department boss, and the IT boss. Just to make people think twice (or at least once, in many cases) before causing such a change.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

David_Billa · Posted 04-04-2020 09:19 AM

Yes, we have a plan to set up a change management process in the future.

However, as a first step we have to tackle this now in SQL table by
maintaining the history. Once we were able to do it, then we will rewrite
the table where attributes has changed.

Could you please point me to any documents or to any examples which can
help me to accomplish this challenge?

Tom · Posted 04-04-2020 03:14 PM

What is the question?

Are you asking how to use the new record to calculate from the existing record:

2	Key	char	6	121216	Y	19000101T000000	99991231T235959

to the new records:

2	Key	char	6	121216	N	19000101T000000	20200403T000000
2	Key	char	8	121216	Y	20200404T000000	99991231T235959

? PS Why doesn't the end timestamp get set to 20200403T235959 ?

Or how to update the table once you know what changes to make?

Looks to me like a simple way would be one of :

- delete the existing record and insert the two new records.

- change the status and end timestamp for the one existing record and insert one new record.

David_Billa · Posted 04-04-2020 03:46 PM

If there is any change to any field as I mentioned in my post then I have
to change the value from Y to N in Act_Ind (active indicator)variable and
time stamp in valid_to variable should have the value of current date.

In the new record I should have the updated value for the variable which
got changed followed by act_ind variable should have Y and valid_from
should have value (previous valid_to date+1) and valid_to should have
maximum end date as I shown in the example.

I'm OK with any method (simple data step or SCD or ...) to achieve this
task.

SASKiwi · Posted 04-04-2020 06:13 PM

@David_Billa - This is a little off-topic but how are you dealing with a numeric column that could have a maximum of 18 digits when SAS numeric columns can only hold around 15 digits accurately?

David_Billa · Posted 04-04-2020 11:19 PM

It will be reduced to 15 in the future. Any thoughts to provide me the hint
to accomplish the challenge which I'm facing now?

SASKiwi · Posted 04-05-2020 01:49 AM

Your use case example doesn't make sense to me. It is impossible to change the attribute of a column - its length- for a single row. As @Kurt_Bremser has already stated, it has to change for all rows. So I see no need to create 2 rows for column 2 when no values in the row change at all between the 2. It would be different if the value changed from 121216 to say 12121689, but if that were to happen then you would have to modify the table schema and reload all of the data to accommodate the wider column.

David_Billa · Posted 04-05-2020 04:27 AM

As the received responses are off-topic, I've updated the initial post.

@Tom @SASKiwi

Kurt_Bremser · Posted 04-05-2020 09:39 AM

OK, I seem to have misinterpreted your dataset structure.

Please have a look at this:

data have;
infile datalines dlm='09'x dsd truncover;
input ID $ Name $ Sales $ Transaction :$2. Example_Values :$20. Act_Ind $ Valid_From :b8601dt16. Valid_To :b8601dt16.;
format Valid_From Valid_To b8601dt16.;
datalines;
1	Unit	True	6	58001	Y	19000101T000000	99991231T235959
2	Key	True	6	121216	Y	19000101T000000	99991231T235959
3	Value	True	18	820595,2	Y	19000101T000000	99991231T235959
;

data upd;
infile datalines dlm='09'x dsd truncover;
input ID $ Name $ Sales $ Transaction :$2. Example_Values :$20. Act_Ind $ Valid_From :b8601dt16. Valid_To :b8601dt16.;
format Valid_From Valid_To b8601dt16.;
datalines;
2	Key	True	8	121216	Y	20200404T000000	99991231T235959
;

proc compare
  base=have
  compare=upd
  out=comp
  outdif
  outnoequal
  noprint
;
by id;
run;

data realupd;
merge
  upd (in=a)
  comp (in=b keep=id)
;
by id;
if b;
run;

data new;
set
  have
  realupd (in=n)
;
by id;
if n
then new = 1;
else new = 0;
run;

data want;
merge 
  new
  new (firstobs=2 keep=id new valid_from rename=(id=_id valid_from=_vfrom new=_new))
;
if id = _id and _new
then valid_to = intnx('dtday',_vfrom,-1);
drop new _:;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 04-05-2020 10:14 AM

@Kurt_Bremser Can you provide some comments to explain what those steps are doing? Can you explain why it works?

Kurt_Bremser · Posted 04-05-2020 10:55 AM

Good idea 😉

First, I run a proc compare to identify observations that are really new. Then I use the resulting dataset to extract those really new observations from the update dataset.

Then I do an interleave (a set of multiple datasets with a by), and keep a new variable that tells me if I have a new observation added.

In the final step, I do a look-ahead for the valid_from column and the new column. If I detect that the next observation is a new one for the current id, I overwrite the valid_to timestamp with a value 1 day before the valid_from of the added observation.

This step also makes sure that a new id is added without further action.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

David_Billa · Posted 04-06-2020 02:27 AM

@Kurt_Bremser Thanks for your code and it is working if the existing data is updated, but it seem to be not working when the new records are inserted. Also I don't want to restrict the 'by' clause only by id and therefore I tried with _all_ as shown below but I'm getting no records in the output after executing proc compare.

I've a feel there might be slight changes are required in the code to handle all situations (new records inserted,any records deleted or existing records updated) which I'm unable to figure out.

data have;

set lib.sql_tbl;

run;

proc sort data=have;

by _all_;

run;

data upd;

set have1;

run;

proc sort data=upd;

by _all_;

run;

proc compare

base=have

compare=upd

out=comp

outdif

outnoequal

noprint

;

by _all_;

run;

data realupd;

merge

upd (in=a)

comp (in=b drop=_type_ _obs_)

;

by _all_;

if b;

run;

Log of proc compare:

26         proc compare
27           base=have
28           compare=upd
29           out=comp
30           outdif
31           outnoequal
32           noprint
33         ;
34         by _all_;
35         run;

NOTE: Except for the 15 BY variables, the data sets WORK.HAVE and WORK.UPD have no variables in common. There are no matching 
      variables to compare. Comparisons of data values not performed.
NOTE: There were 99 observations read from the data set WORK.HAVE.
NOTE: There were 99 observations read from the data set WORK.UPD.
NOTE: The data set WORK.COMP has 0 observations and 17 variables.

Kurt_Bremser · Posted 04-06-2020 03:36 AM

This is going nowhere.

Please supply example data in data steps with datalines (THIS IS AN ABSOLUTE REQUIREMENT!), and what you want to get out of it.

Right now I have no idea any longer what your data looks like.

Hint: using _all_ in a by statement makes no sense in most cases.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ChrisNZ · Posted 04-06-2020 03:08 AM

This is exactly what the UPDATE statement is for.

This works:

data MASTER;
input (Id  Name  Sales   Transaction   Example  Act_Ind   Valid_From  Valid_To) (:$20.);
cards;
1   Unit  True  6   58001   Y   19900101T000000   99991231T235959
2   Key   True  6   121216  Y   199100101T000000   99991231T235959
3   Value   True  18  820595,2  Y   19920101T000000   99991231T235959
run;

data TRANS;
input (Id  Name  Sales   Transaction   Example  Act_Ind   Valid_From  Valid_To) (:$20.);
cards;
2   Key   True  8   121216  Y   20200404T000000   99991231T235959
run;

data MASTER;
  modify MASTER
         TRANS(rename=(
           NAME=N SALES=S TRANSACTION=T EXAMPLE=E ACT_IND=A VALID_FROM=VF VALID_TO=VT      
         ));
  by ID;
  select (_IORC_); 
    when (%SYSRC(_SOK)) do;    %* Transaction matches master; 
      Valid_To=VF;Act_Ind='N';
      replace;  
      NAME=N; SALES=S; TRANSACTION=T; EXAMPLE=E; ACT_IND=A; VALID_FROM=VF; VALID_TO=VT;      
      output;
    end;     
    when (%SYSRC(_DSENMR)) do; %* New Transaction ;
      output;    
      _error_=0;    
    end;    
    when (%SYSRC(_DSEMTR)) do; %* New Transaction, not first key ; 
      put 'ERR' 'OR: Duplicate Values on transaction dataset';  
      stop;     
    end;    
    otherwise do;   
      put 'ERR' 'OR: Unknown IO ';  
      stop;   
    end;  
  end; 
 run;


Id	Name	Sales	Transaction	Example	Act_Ind	Valid_From	Valid_To
1	Unit	True	6	58001	Y	19900101T000000	99991231T235959
2	Key	True	6	121216	N	199100101T000000	20200404T000000
3	Value	True	18	820595,2	Y	19920101T000000	99991231T235959
2	Key	True	8	121216	Y	20200404T000000	99991231T235959

Note that you did not provide metadata in the form of a data step for the existing tables, so I just used what was easier.

If the table is so large that sorting it is not an option, a different method that uses an index can be coded to access observations directly.

A good resource: https://www.lexjansen.com/pnwsug/2006/PN01CurtisMackModify.pdf

High-Performance SAS Coding - Third Edition

Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Re: Select only updated records and load the updated records in SQL table

Registration is open

SAS Training: Just a Click Away