BookmarkSubscribeRSS Feed
dhealth
Calcite | Level 5

Hi All,

I am trying to figure out how to approach a problem. I have a basic procedure that cleans and outputs a data set that we transfer to a partner organization. They have asked to change the procedure so that we only transmit lines of data that are new since the last transfer or older entries that have a change to 1 or more variable entries. I imagine I need to make create a comparison table each time I transmit so that the next data pull can compare back to it.

What I don't know is how to write the code to compare for new or changed entries. Any help is appreciated!

Thank you

3 REPLIES 3
Reeza
Super User

How many variables are you looking at?

 

You can look at PROC COMPARE to see if any of those output data set meet your needs, but I suspect you'll need to code your own to get exactly what you want. If you need further assistance it would probably be a good idea to show some sample data and what you want as output. THIS DOES NOT NEED TO BE YOUR ACTUAL DATA. It just needs to reflect how your data is structured and the complexities of the problem. 

 


@dhealth wrote:

Hi All,

I am trying to figure out how to approach a problem. I have a basic procedure that cleans and outputs a data set that we transfer to a partner organization. They have asked to change the procedure so that we only transmit lines of data that are new since the last transfer or older entries that have a change to 1 or more variable entries. I imagine I need to make create a comparison table each time I transmit so that the next data pull can compare back to it.

What I don't know is how to write the code to compare for new or changed entries. Any help is appreciated!

Thank you


 

Shmuel
Garnet | Level 18

It will be easy if you add a variable to the dataset, a flag of NEW/OLD data.

Each time you add or update an observation sign it as NEW.

After sending the changes (flaged as NEW) update the flag to OLD in all observations.

 

You may want to add a date variable of the last date sent to the partner, to each observation,

that is the date of changingthe flag from  NEW to OLD.

Reeza
Super User

I like @Shmuel's idea. If you have a last_modified date on your data set, which a lot of DB's have by default within a  data warehouse, you could use that. If you don't, you can kind of create your own. 

 

How is your data modified/updated each time is a bit of the key question. Here's an example of how this could work.

 

*Original data set;
data class1; 
set sashelp.class;
last_update_date = '01Jan2017'd;
run;

*Changes;
data class2;
set sashelp.class;
last_update_date = '02Jan2017'd;
if age=13 then age=21;

else if height>65 then height= 100;
else delete;

run;


*sort for updates;
proc sort data=class1; by name;
proc sort data=class2; by name;run;

*update data set;
data class3;
update class1 class2;
by name;
run;

*find max date;
proc sql noprint;
create table changes as
select *
from class3
having last_update_date = max(last_update_date);
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 577 views
  • 2 likes
  • 3 in conversation