DATA Step, Macro, Functions and more

create data set with only new/changed data

Reply
New User
Posts: 1

create data set with only new/changed data

Hi All,

I am trying to figure out how to approach a problem. I have a basic procedure that cleans and outputs a data set that we transfer to a partner organization. They have asked to change the procedure so that we only transmit lines of data that are new since the last transfer or older entries that have a change to 1 or more variable entries. I imagine I need to make create a comparison table each time I transmit so that the next data pull can compare back to it.

What I don't know is how to write the code to compare for new or changed entries. Any help is appreciated!

Thank you

Super User
Posts: 23,928

Re: create data set with only new/changed data

How many variables are you looking at?

 

You can look at PROC COMPARE to see if any of those output data set meet your needs, but I suspect you'll need to code your own to get exactly what you want. If you need further assistance it would probably be a good idea to show some sample data and what you want as output. THIS DOES NOT NEED TO BE YOUR ACTUAL DATA. It just needs to reflect how your data is structured and the complexities of the problem. 

 


dhealth wrote:

Hi All,

I am trying to figure out how to approach a problem. I have a basic procedure that cleans and outputs a data set that we transfer to a partner organization. They have asked to change the procedure so that we only transmit lines of data that are new since the last transfer or older entries that have a change to 1 or more variable entries. I imagine I need to make create a comparison table each time I transmit so that the next data pull can compare back to it.

What I don't know is how to write the code to compare for new or changed entries. Any help is appreciated!

Thank you


 

Trusted Advisor
Posts: 1,848

Re: create data set with only new/changed data

It will be easy if you add a variable to the dataset, a flag of NEW/OLD data.

Each time you add or update an observation sign it as NEW.

After sending the changes (flaged as NEW) update the flag to OLD in all observations.

 

You may want to add a date variable of the last date sent to the partner, to each observation,

that is the date of changingthe flag from  NEW to OLD.

Super User
Posts: 23,928

Re: create data set with only new/changed data

I like @Shmuel's idea. If you have a last_modified date on your data set, which a lot of DB's have by default within a  data warehouse, you could use that. If you don't, you can kind of create your own. 

 

How is your data modified/updated each time is a bit of the key question. Here's an example of how this could work.

 

*Original data set;
data class1; 
set sashelp.class;
last_update_date = '01Jan2017'd;
run;

*Changes;
data class2;
set sashelp.class;
last_update_date = '02Jan2017'd;
if age=13 then age=21;

else if height>65 then height= 100;
else delete;

run;


*sort for updates;
proc sort data=class1; by name;
proc sort data=class2; by name;run;

*update data set;
data class3;
update class1 class2;
by name;
run;

*find max date;
proc sql noprint;
create table changes as
select *
from class3
having last_update_date = max(last_update_date);
quit;

Ask a Question
Discussion stats
  • 3 replies
  • 117 views
  • 2 likes
  • 3 in conversation