merge data in order to get rid of observations

Accepted Solution Solved
Reply
Regular Contributor
Posts: 161
Accepted Solution

merge data in order to get rid of observations

Hi, everybody, I have a newbie question:

I have a huge log with the layout as below.  The log file is sorted by investor_id, date, and time.

investor_iddatetimefile
A1/1/199502:22:03X
A1/2/199502:22:04X
B2/3/200102:19:11Y
C2/3/200118:12:11Y

and a kill list that I want to delete the observation in that list from the log:

investor_iddate
X2/3/2005
Y2/3/2006
Z3/3/2007
A4/1/2006
...
...

.

My question is how to merge log with the kill_list so that I can delete from log any observations on kill_list efficiently (merge criteria is on investor_di and date)?  So if there is any investors and date combo that appear on the kill_list, that observation would be deleted.  The thing is that the log is very big, over 130 GB.  And the kill_list is not small also, around 25 GB.

Thank you so much!


Accepted Solutions
Solution
‎06-06-2013 03:33 PM
Super User
Posts: 10,474

Re: merge data in order to get rid of observations

With the kill list sorted by id and date the approach with merge would be something like

Data want ;

     merge log (in=inlog) kill (In=inKill);

     by investor_id date;

     if inlog and not (inKill);

run;


View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: merge data in order to get rid of observations

1. Are they both sorted by the variables of interest? This is for the options of "merge"

2. Does your kill_list only have those two columns or have a lot more other variables irrelevant to your current interest?  this is for the possibility of Hash().

Haikuo

Regular Contributor
Posts: 161

Re: merge data in order to get rid of observations

Thank you for your suggestions!

The kill_list contains nothing but the investor_id and date because these are the only two criteria for deletion.  The log (big dataset) is sorted by investor_id, date, and time variable.  I should add that there is a time variable I didn't put in the original question.  But I'll update it....

Respected Advisor
Posts: 3,124

Re: merge data in order to get rid of observations

In that case, Hash() can't really help you unless you have a very big RAM > 25G. So try 's suggestion.

Good luck,

Haikuo

Regular Contributor
Posts: 161

Re: merge data in order to get rid of observations

Thank you all for your help.  Ed's program runs without a problem.  It is now completed already - Smiley Happy

Solution
‎06-06-2013 03:33 PM
Super User
Posts: 10,474

Re: merge data in order to get rid of observations

With the kill list sorted by id and date the approach with merge would be something like

Data want ;

     merge log (in=inlog) kill (In=inKill);

     by investor_id date;

     if inlog and not (inKill);

run;


Regular Contributor
Posts: 161

Re: merge data in order to get rid of observations

I'll give it a try right now.  Thank you!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 268 views
  • 6 likes
  • 3 in conversation