BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tri_Luong
Obsidian | Level 7

Hi everyone,

 

If I have 2 datasets having the same variables and both of them having the variable named "report_date":

 

1. Daily_Data (imported into LIB "Work")

2. Consolidated_Data (stored at LIB "Mylib")

 

On daily basic, after importing "Daily_data", I am using the scripts:

 

data Mylib.consolidated_data;
       set Mylib.Consolidated_data work.daily_data;
run;

to extend the dataset "Consolidated_data". However, to avoid combining the same source data twice or 3 times unintentionally, can we put a condition some thing like that:

 

Let say, the dataset "Daily_data" with the variable "Report_date" having one same value. For eg. 09/08/2018. Accordingly, if the variable "Report_date" in dataset "Consolidated_data" contains at least one observation equal to 09/08/2018, the combining step will be skipped.


Thank you,

 

It is never late for learning.
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

To prevent accidental dupülication of entries, but still allow updates (if new data needs to be corrected), I'd do this:

data _null_;
set work.daily_data;
call symput('exclude',put(report_date,best.));
stop;
run;

data Mylib.consolidated_data;
set
  Mylib.Consolidated_data (
    where=(report_date ne &exclude)
  )
  work.daily_data
;
run;

Note how a little visual formatting makes the set statement easier to grok.

 

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

To prevent accidental dupülication of entries, but still allow updates (if new data needs to be corrected), I'd do this:

data _null_;
set work.daily_data;
call symput('exclude',put(report_date,best.));
stop;
run;

data Mylib.consolidated_data;
set
  Mylib.Consolidated_data (
    where=(report_date ne &exclude)
  )
  work.daily_data
;
run;

Note how a little visual formatting makes the set statement easier to grok.

 

Tri_Luong
Obsidian | Level 7

Thank you very much. It was what I am looking for. My scripts have worked.

It is never late for learning.
PeterClemmensen
Tourmaline | Level 20

Assuming that your constant Report_Date var is always larger than the report_date values in the consolidated data set, you can do something like this

 

proc sql;
   insert into Mylib.consolidated_data as cons
   select * from work.daily_data as daily
   having daily.Report_date > max(cons.Report_date);
quit;
Tri_Luong
Obsidian | Level 7

Thanks for your instruction but I would like to keep proc import. 🙂

It is never late for learning.
Astounding
PROC Star

First, note that your program will take longer and longer to run over time.  Instead of using  DATA step to combine the data sets, it would be much more efficient to use PROC APPEND:

 

proc append data=daily_data base=Mylib.consolidated_data;

run;

 

This assumes that the data set structure never changes (same variables, same lengths for each variable, every time).

 

Since the latest batch of data will appear at the end of the DATA step, you can rely on the final observation in the consolidated data set coming from the last batch of data.  To extract the final observation, you could use something like this:

 

data final;

   set Mylib.consolidated_data nobs=_nobs_ point=_nobs_;

   output;

   stop;

   keep report_date;

   rename report_date = final_previous_date;

run;

 

Since it reads just 1 observation, it will be speedy.

 

Then get the earliest date in your daily data set:

 

proc summary data=daily_data;

   var report_date;

   output out=daily_summary min=current_minimum_date;

run;

 

Finally, combine these into a way that runs PROC APPEND only when needed;

 

data _null_;

   set daily_summary;

   set final;

   if current_minimum_date > final_previous_date then do;

      call execute("proc append data=daily_data base=Mylib.consolidated_data; run;");

   end;

run;

SuryaKiran
Meteorite | Level 14

Check if PROC SORT with NODUPKEY will work for you.

Thanks,
Suryakiran
Tri_Luong
Obsidian | Level 7

Thanks. As I am quite newbie. Would you mind show me the detailed codes?

It is never late for learning.
SuryaKiran
Meteorite | Level 14
data Mylib.consolidated_data;
       set Mylib.Consolidated_data work.daily_data;
run;

proc sort data Mylib.consolidated_data nodupkey;
by report_date;
run;

After the data is appended you can sort the data set with nodupkey option which will remove duplicate report_date.

 

Note: The second occurrence of the same by variable record will be deleted.   

example:

data test;
input id num;
datalines;
1 10
1 20
2 10
3 10
;
run;
proc sort data=test nodupkey;
by id;
run;
/* If you want to keep the highest number */
data test;
input id num;
datalines;
1 10
1 20
2 10
3 10
;
run;
proc sort data=test;
by id descending num;
run;
proc sort data=test nodupkey;
by id;
run;
Thanks,
Suryakiran

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1019 views
  • 0 likes
  • 5 in conversation