DATA Step, Macro, Functions and more

Check data before combining datasets

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

Check data before combining datasets

Hi everyone,

 

If I have 2 datasets having the same variables and both of them having the variable named "report_date":

 

1. Daily_Data (imported into LIB "Work")

2. Consolidated_Data (stored at LIB "Mylib")

 

On daily basic, after importing "Daily_data", I am using the scripts:

 

data Mylib.consolidated_data;
       set Mylib.Consolidated_data work.daily_data;
run;

to extend the dataset "Consolidated_data". However, to avoid combining the same source data twice or 3 times unintentionally, can we put a condition some thing like that:

 

Let say, the dataset "Daily_data" with the variable "Report_date" having one same value. For eg. 09/08/2018. Accordingly, if the variable "Report_date" in dataset "Consolidated_data" contains at least one observation equal to 09/08/2018, the combining step will be skipped.


Thank you,

 

It is never late for learning.

Accepted Solutions
Solution
2 weeks ago
Super User
Posts: 10,594

Re: Check data before combining datasets

Posted in reply to Tri_Luong

To prevent accidental dupülication of entries, but still allow updates (if new data needs to be corrected), I'd do this:

data _null_;
set work.daily_data;
call symput('exclude',put(report_date,best.));
stop;
run;

data Mylib.consolidated_data;
set
  Mylib.Consolidated_data (
    where=(report_date ne &exclude)
  )
  work.daily_data
;
run;

Note how a little visual formatting makes the set statement easier to grok.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

View solution in original post


All Replies
Solution
2 weeks ago
Super User
Posts: 10,594

Re: Check data before combining datasets

Posted in reply to Tri_Luong

To prevent accidental dupülication of entries, but still allow updates (if new data needs to be corrected), I'd do this:

data _null_;
set work.daily_data;
call symput('exclude',put(report_date,best.));
stop;
run;

data Mylib.consolidated_data;
set
  Mylib.Consolidated_data (
    where=(report_date ne &exclude)
  )
  work.daily_data
;
run;

Note how a little visual formatting makes the set statement easier to grok.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 19

Re: Check data before combining datasets

Posted in reply to KurtBremser

Thank you very much. It was what I am looking for. My scripts have worked.

It is never late for learning.
PROC Star
Posts: 1,405

Re: Check data before combining datasets

Posted in reply to Tri_Luong

Assuming that your constant Report_Date var is always larger than the report_date values in the consolidated data set, you can do something like this

 

proc sql;
   insert into Mylib.consolidated_data as cons
   select * from work.daily_data as daily
   having daily.Report_date > max(cons.Report_date);
quit;
Occasional Contributor
Posts: 19

Re: Check data before combining datasets

Thanks for your instruction but I would like to keep proc import. Smiley Happy

It is never late for learning.
Super User
Posts: 6,935

Re: Check data before combining datasets

[ Edited ]
Posted in reply to Tri_Luong

First, note that your program will take longer and longer to run over time.  Instead of using  DATA step to combine the data sets, it would be much more efficient to use PROC APPEND:

 

proc append data=daily_data base=Mylib.consolidated_data;

run;

 

This assumes that the data set structure never changes (same variables, same lengths for each variable, every time).

 

Since the latest batch of data will appear at the end of the DATA step, you can rely on the final observation in the consolidated data set coming from the last batch of data.  To extract the final observation, you could use something like this:

 

data final;

   set Mylib.consolidated_data nobs=_nobs_ point=_nobs_;

   output;

   stop;

   keep report_date;

   rename report_date = final_previous_date;

run;

 

Since it reads just 1 observation, it will be speedy.

 

Then get the earliest date in your daily data set:

 

proc summary data=daily_data;

   var report_date;

   output out=daily_summary min=current_minimum_date;

run;

 

Finally, combine these into a way that runs PROC APPEND only when needed;

 

data _null_;

   set daily_summary;

   set final;

   if current_minimum_date > final_previous_date then do;

      call execute("proc append data=daily_data base=Mylib.consolidated_data; run;");

   end;

run;

PROC Star
Posts: 631

Re: Check data before combining datasets

Posted in reply to Tri_Luong

Check if PROC SORT with NODUPKEY will work for you.

Thanks,
Suryakiran
Occasional Contributor
Posts: 19

Re: Check data before combining datasets

Posted in reply to SuryaKiran

Thanks. As I am quite newbie. Would you mind show me the detailed codes?

It is never late for learning.
PROC Star
Posts: 631

Re: Check data before combining datasets

Posted in reply to Tri_Luong
data Mylib.consolidated_data;
       set Mylib.Consolidated_data work.daily_data;
run;

proc sort data Mylib.consolidated_data nodupkey;
by report_date;
run;

After the data is appended you can sort the data set with nodupkey option which will remove duplicate report_date.

 

Note: The second occurrence of the same by variable record will be deleted.   

example:

data test;
input id num;
datalines;
1 10
1 20
2 10
3 10
;
run;
proc sort data=test nodupkey;
by id;
run;
/* If you want to keep the highest number */
data test;
input id num;
datalines;
1 10
1 20
2 10
3 10
;
run;
proc sort data=test;
by id descending num;
run;
proc sort data=test nodupkey;
by id;
run;
Thanks,
Suryakiran
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 115 views
  • 0 likes
  • 5 in conversation