Purging SAS Data over 6 months

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 107
Accepted Solution

Purging SAS Data over 6 months

Hello all,

 

I have a very huge dataset with the data appended to it every day. I want to make the dataset small by deleting the records over 6 months. Essentially the dataset should have just the past 6 months data. I am looking for a code/macro that runs everyday and deletes the data over 6 months. I have tried the below code and just wondering if any of you guys help me with any better code?

 

Thanks in advance for your help. Very much appreciated!

 

MY CODE:

data have;
informat salesdate mmddyy10.;
format salesdate mmddyy10.;
input salesdate;
cards;
10/10/2014
11/10/2014
12/21/2014
5/1/2015
6/12/2015
7/1/2015
8/12/2015
9/29/2015
10/1/2015
;
proc print;run;
data want;
set have;
if intnx('month',today(),-6,'b') <= salesdate < intnx('month',today(),0,'b');
proc print;run;


Accepted Solutions
Solution
‎12-02-2015 08:54 AM
Super User
Super User
Posts: 7,682

Re: Purging SAS Data over 6 months

I would stick with datastep for this, SQL is not great in SAS for large datasets.  Your code could be restricted down to one statement (note also the formatting, e.g spacing and finishing code blocks):

data have;
informat salesdate mmddyy10.;
format salesdate mmddyy10.;
input salesdate;
cards;
10/10/2014
11/10/2014
12/21/2014
5/1/2015
6/12/2015
7/1/2015
8/12/2015
9/29/2015
10/1/2015
;
run;

data want;
set have (where=(intnx('month',today(),-6,'b') <=salesdate));
run;

View solution in original post


All Replies
Contributor
Posts: 26

Re: Purging SAS Data over 6 months

Hi,

You can create a seperate monthly process which can run on first of every month and delete the records from ur table which are older then 6 months.

 

for the rest of the month you can run ur append process as normal

 

-----------------------------------------------------------------------------------------------------------------------------

Process to delete the data which is older then 6 months from now as below

 

proc sql noprint;
delete * from have
where salesdate <= intnx('month',today(),0,'b');
quit;

 

------------------------------------------------------------------------------------------------------------------------------

Trusted Advisor
Posts: 1,116

Re: Purging SAS Data over 6 months


AbhiD wrote:

[...]

Process to delete the data which is older then 6 months from now as below

 

proc sql noprint;
delete * from have
where salesdate <= intnx('month',today(),0,'b');
quit;

 

------------------------------------------------------------------------------------------------------------------------------


@AbhiD: The PROC SQL step you suggested would delete not only data older than 6 months, but also data up to and including the first day of the current month! And this would happen to the original dataset HAVE.

 

@KiranMaddi: The IF condition you specified would not select data from the current month. Are the most current data of such little interest?

Frequent Contributor
Posts: 107

Re: Purging SAS Data over 6 months

@freelance: Yes, I would need the most current data too.
Trusted Advisor
Posts: 1,116

Re: Purging SAS Data over 6 months

In this case I would support @RW9's suggestion.

 

Given such a "very huge" amount of data, you should indeed develop a data management strategy. (You've probably done this already.) Obviously, it would be time-consuming to apply even this simple SET statement regularly to a dataset containing the entire data history, if it was really large. Therefore, it could make sense to split the data history into reasonably sized chunks (based on SALESDATE). This would not preclude performing analyses involving all of the data.

 

Solution
‎12-02-2015 08:54 AM
Super User
Super User
Posts: 7,682

Re: Purging SAS Data over 6 months

I would stick with datastep for this, SQL is not great in SAS for large datasets.  Your code could be restricted down to one statement (note also the formatting, e.g spacing and finishing code blocks):

data have;
informat salesdate mmddyy10.;
format salesdate mmddyy10.;
input salesdate;
cards;
10/10/2014
11/10/2014
12/21/2014
5/1/2015
6/12/2015
7/1/2015
8/12/2015
9/29/2015
10/1/2015
;
run;

data want;
set have (where=(intnx('month',today(),-6,'b') <=salesdate));
run;
Super User
Posts: 5,380

Re: Purging SAS Data over 6 months

@RW are you sure that you don't spread desinformation?

I have never seen any performance difference when it comes to simple table scans with where. If you have any valid documentation to support your statement, I would be happy to read it :-)

 

But to the issue.

Use of DELETE FROM will not make the data set smaller in physical size, just logical.

If the data set is huge, and there's a real pain to delete records performance wise, Base SAS libraries may not be the ultimate storage.

Most RDBMS and SAS SPD Server supports clustering/partitioning, which let us drop a whole partition without scanning/rewrite the whole data set.

Data never sleeps
Super User
Super User
Posts: 7,682

Re: Purging SAS Data over 6 months

I think you misinterpret my point, the point I am trying to make is that the second part of the if (or where doesn't matter which) is not needed so this:

if intnx('month',today(),-6,'b') <= salesdate < intnx('month',today(),0,'b');

 

Becomes 

 

if intnx('month',today(),-6,'b') <= salesdate;

 

Use of where is just a personal preference as I feel it reads easier, especially if there are multiple inputs - hence I use where far more.

Super User
Posts: 5,380

Re: Purging SAS Data over 6 months

Ooops, I wasn't clear enough Smiley Embarassed- I mean SQL vs. data step, not IF/WHERE.

When it comes to subsetting, NEVER use IF, it's slower, every time. Especially when it comes to SAS/ACCESS, IF is (almost) never sent to the RDBS, WHERE will be pass thru if certain conditions are met.

Another best practice is to avoid using functions in the where statement. Even if SAS have improved it's compilation engine to avoid function call on each observation, it's safer to use syntax without function calls. So, in this case today() could easily be changed to "&SYSDATE"d.

Data never sleeps
Super User
Posts: 5,353

Re: Purging SAS Data over 6 months

Agreed, and would want to consider whether this might be possible as well:

 

where "&sysdate9"d - 183 <= salesdate;

 

If the definition of "6 months of data" can go with 183 days (rather than 6 calendar months), this WHERE statement gets rid of all functions.

Trusted Advisor
Posts: 1,116

Re: Purging SAS Data over 6 months

Just a side note: Sometimes, people working with "very huge" datasets (like @KiranMaddi) let their SAS sessions run overnight. In this case, a difference between the session start date "&SYSDATE"d and the current date TODAY() will occur.

Frequent Contributor
Posts: 107

Re: Purging SAS Data over 6 months

Yes, You are right @FreelanceReinhard
Frequent Contributor
Posts: 107

Re: Purging SAS Data over 6 months

Hello All,

 

 

Thank you very much for all your wonderful suggestions. I might go with @RW9 suggestion unless someone comes up with a better one?

Well done every one! very much appreciated.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 573 views
  • 6 likes
  • 6 in conversation