BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

We've a process where clients will update the file (say sales details) when there is a need and they intimate us about the change. So our job is to load those updated records to SQL DB table by maintaing the history.

 

For the first time while loading the data to SQL we just did a 'proc append' to SQL table after reading the file via data step infile Statement. However this method will not work to track the history in SQL when there is a update in the file.

 

I'm not Aware of SCDs as well.  We are not using any Primary/foreign key stuff in SQL.I would like to understand how I can tackle the situation to update the history. I tried with 'Proc SQL update' and 'If first. and last.' but it didn't helped either. Appericiate your help here.

 

Assume I've a file like this.

 

Id Name Sales Transaction Example Values Act_Ind Valid_From Valid_To
1 Unit True 6 58001 Y 19000101T000000 99991231T235959
2 Key True 6 121216 Y 19000101T000000 99991231T235959
3 Value True 18 820595,2 Y 19000101T000000 99991231T235959

 

Assume I've received an update to the file like this. If you notice Transaction value has changed for Key from 6 to 8.

 

Id  Name Sales Transaction Example Values Act_Ind Valid_From Valid_To
2 Key True 8 121216 Y 20200404T000000 99991231T235959

 

So now I want data like this SQL table.

 

Id  Name Sales Transaction Example Values Act_Ind Valid_From Valid_To
1 Unit True 6 58001 Y 19000101T000000 99991231T235959
2 Key True 6 121216 N 19000101T000000 20200403T000000
2 Key True 8 121216 Y 20200404T000000 99991231T235959
3 Value True 18 820595,2 Y 19000101T000000 99991231T235959

 

 

18 REPLIES 18
Kurt_Bremser
Super User

When column attributes that affect storage (length and/or type) change, you have to rewrite the whole table. In SAS and in a DBMS. And a column will have the same length throughout the whole table, period.

I suggest you set up a change management process where changes in structure need to be approved by organization and planning, the department boss, and the IT boss. Just to make people think twice (or at least once, in many cases) before causing such a change.

David_Billa
Rhodochrosite | Level 12
Yes, we have a plan to set up a change management process in the future.

However, as a first step we have to tackle this now in SQL table by
maintaining the history. Once we were able to do it, then we will rewrite
the table where attributes has changed.

Could you please point me to any documents or to any examples which can
help me to accomplish this challenge?
Tom
Super User Tom
Super User

What is the question?

Are you asking how to use the new record to calculate from the existing record:

2	Key	char	6	121216	Y	19000101T000000	99991231T235959

to the new records:

2	Key	char	6	121216	N	19000101T000000	20200403T000000
2	Key	char	8	121216	Y	20200404T000000	99991231T235959

?  PS Why doesn't the end timestamp get set to 20200403T235959 ?

Or how to update the table once you know what changes to make?

Looks to me like a simple way would be one of :

- delete the existing record and insert the two new records.

- change the status and end timestamp for the one existing record and insert one new record.

David_Billa
Rhodochrosite | Level 12
If there is any change to any field as I mentioned in my post then I have
to change the value from Y to N in Act_Ind (active indicator)variable and
time stamp in valid_to variable should have the value of current date.

In the new record I should have the updated value for the variable which
got changed followed by act_ind variable should have Y and valid_from
should have value (previous valid_to date+1) and valid_to should have
maximum end date as I shown in the example.

I'm OK with any method (simple data step or SCD or ...) to achieve this
task.

SASKiwi
PROC Star

@David_Billa - This is a little off-topic but how are you dealing with a numeric column that could have a maximum of 18 digits when SAS numeric columns can only hold around 15 digits accurately? 

David_Billa
Rhodochrosite | Level 12
It will be reduced to 15 in the future. Any thoughts to provide me the hint
to accomplish the challenge which I'm facing now?

SASKiwi
PROC Star

Your use case example doesn't make sense to me. It is impossible to change the attribute of a column - its length- for a single row. As @Kurt_Bremser has already stated, it has to change for all rows. So I see no need to create 2 rows for column 2 when no values in the row change at all between the 2. It would be different if the value changed from 121216 to say 12121689, but if that were to happen then you would have to modify the table schema and reload all of the data to accommodate the wider column.

David_Billa
Rhodochrosite | Level 12

As the received responses are off-topic, I've updated the initial post.

@Tom @SASKiwi 

Kurt_Bremser
Super User

OK, I seem to have misinterpreted your dataset structure.

Please have a look at this:

data have;
infile datalines dlm='09'x dsd truncover;
input ID $ Name $ Sales $ Transaction :$2. Example_Values :$20. Act_Ind $ Valid_From :b8601dt16. Valid_To :b8601dt16.;
format Valid_From Valid_To b8601dt16.;
datalines;
1	Unit	True	6	58001	Y	19000101T000000	99991231T235959
2	Key	True	6	121216	Y	19000101T000000	99991231T235959
3	Value	True	18	820595,2	Y	19000101T000000	99991231T235959
;

data upd;
infile datalines dlm='09'x dsd truncover;
input ID $ Name $ Sales $ Transaction :$2. Example_Values :$20. Act_Ind $ Valid_From :b8601dt16. Valid_To :b8601dt16.;
format Valid_From Valid_To b8601dt16.;
datalines;
2	Key	True	8	121216	Y	20200404T000000	99991231T235959
;

proc compare
  base=have
  compare=upd
  out=comp
  outdif
  outnoequal
  noprint
;
by id;
run;

data realupd;
merge
  upd (in=a)
  comp (in=b keep=id)
;
by id;
if b;
run;

data new;
set
  have
  realupd (in=n)
;
by id;
if n
then new = 1;
else new = 0;
run;

data want;
merge 
  new
  new (firstobs=2 keep=id new valid_from rename=(id=_id valid_from=_vfrom new=_new))
;
if id = _id and _new
then valid_to = intnx('dtday',_vfrom,-1);
drop new _:;
run;
Tom
Super User Tom
Super User

@Kurt_Bremser Can you provide some comments to explain what those steps are doing? Can you explain why it works?

Kurt_Bremser
Super User

Good idea 😉

 

First, I run a proc compare to identify observations that are really new. Then I use the resulting dataset to extract those really new observations from the update dataset.

Then I do an interleave (a set of multiple datasets with a by), and keep a new variable that tells me if I have a new observation added.

In the final step, I do a look-ahead for the valid_from column and the new column. If I detect that the next observation is a new one for the current id, I overwrite the valid_to timestamp with a value 1 day before the valid_from of the added observation.

This step also makes sure that a new id is added without further action.

David_Billa
Rhodochrosite | Level 12

@Kurt_Bremser Thanks for your code and it is working if the existing data is updated, but it seem to be not working when the new records are inserted. Also I don't want to restrict the 'by' clause only by id and therefore I tried with _all_ as shown below but I'm getting no records in the output after executing proc compare. 

 

I've a feel there might be slight changes are required in the code to handle all situations (new records inserted,any records deleted or existing records updated) which I'm unable to figure out.

 

data have;

set lib.sql_tbl;

run;

proc sort data=have;

by _all_;

run;

 

data upd;

set have1;

run;

proc sort data=upd;

by _all_;

run;

 

proc compare

base=have

compare=upd

out=comp

outdif

outnoequal

noprint

;

by _all_;

run;

 

data realupd;

merge

upd (in=a)

comp (in=b drop=_type_ _obs_)

;

by _all_;

if b;

run;

 

Log of proc compare:

26         proc compare
27           base=have
28           compare=upd
29           out=comp
30           outdif
31           outnoequal
32           noprint
33         ;
34         by _all_;
35         run;

NOTE: Except for the 15 BY variables, the data sets WORK.HAVE and WORK.UPD have no variables in common. There are no matching 
      variables to compare. Comparisons of data values not performed.
NOTE: There were 99 observations read from the data set WORK.HAVE.
NOTE: There were 99 observations read from the data set WORK.UPD.
NOTE: The data set WORK.COMP has 0 observations and 17 variables.

 

 

Kurt_Bremser
Super User

This is going nowhere.

Please supply example data in data steps with datalines (THIS IS AN ABSOLUTE REQUIREMENT!), and what you want to get out of it.

Right now I have no idea any longer what your data looks like.

Hint: using _all_ in a by statement makes no sense in most cases.

ChrisNZ
Tourmaline | Level 20

This is exactly what the UPDATE statement is for.

This works:

data MASTER;
input (Id  Name  Sales   Transaction   Example  Act_Ind   Valid_From  Valid_To) (:$20.);
cards;
1   Unit  True  6   58001   Y   19900101T000000   99991231T235959
2   Key   True  6   121216  Y   199100101T000000   99991231T235959
3   Value   True  18  820595,2  Y   19920101T000000   99991231T235959
run;

data TRANS;
input (Id  Name  Sales   Transaction   Example  Act_Ind   Valid_From  Valid_To) (:$20.);
cards;
2   Key   True  8   121216  Y   20200404T000000   99991231T235959
run;

data MASTER;
  modify MASTER
         TRANS(rename=(
           NAME=N SALES=S TRANSACTION=T EXAMPLE=E ACT_IND=A VALID_FROM=VF VALID_TO=VT      
         ));
  by ID;
  select (_IORC_); 
    when (%SYSRC(_SOK)) do;    %* Transaction matches master; 
      Valid_To=VF;Act_Ind='N';
      replace;  
      NAME=N; SALES=S; TRANSACTION=T; EXAMPLE=E; ACT_IND=A; VALID_FROM=VF; VALID_TO=VT;      
      output;
    end;     
    when (%SYSRC(_DSENMR)) do; %* New Transaction ;
      output;    
      _error_=0;    
    end;    
    when (%SYSRC(_DSEMTR)) do; %* New Transaction, not first key ; 
      put 'ERR' 'OR: Duplicate Values on transaction dataset';  
      stop;     
    end;    
    otherwise do;   
      put 'ERR' 'OR: Unknown IO ';  
      stop;   
    end;  
  end; 
 run;
 
Id Name Sales Transaction Example Act_Ind Valid_From Valid_To
1 Unit True 6 58001 Y 19900101T000000 99991231T235959
2 Key True 6 121216 N 199100101T000000 20200404T000000
3 Value True 18 820595,2 Y 19920101T000000 99991231T235959
2 Key True 8 121216 Y 20200404T000000 99991231T235959

 

Note that you did not provide metadata in the form of a data step for the existing tables, so I just used what was easier.

If the table is so large that sorting it is not an option, a different method that uses an index can be coded to access observations directly.

A good resource: https://www.lexjansen.com/pnwsug/2006/PN01CurtisMackModify.pdf

 

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 4029 views
  • 3 likes
  • 5 in conversation