merge data in order to get rid of observations

Accepted Solution Solved
Reply
Regular Contributor
Posts: 161
Accepted Solution

merge data in order to get rid of observations

Hi, everybody, I have a newbie question:

I have a huge log with the layout as below.  The log file is sorted by investor_id, date, and time.

investor_iddatetimefile
A1/1/199502:22:03X
A1/2/199502:22:04X
B2/3/200102:19:11Y
C2/3/200118:12:11Y

and a kill list that I want to delete the observation in that list from the log:

investor_iddate
X2/3/2005
Y2/3/2006
Z3/3/2007
A4/1/2006
...
...

.

My question is how to merge log with the kill_list so that I can delete from log any observations on kill_list efficiently (merge criteria is on investor_di and date)?  So if there is any investors and date combo that appear on the kill_list, that observation would be deleted.  The thing is that the log is very big, over 130 GB.  And the kill_list is not small also, around 25 GB.

Thank you so much!


Accepted Solutions
Solution
‎06-06-2013 03:33 PM
Super User
Posts: 11,343

Re: merge data in order to get rid of observations

Posted in reply to caveman529

With the kill list sorted by id and date the approach with merge would be something like

Data want ;

     merge log (in=inlog) kill (In=inKill);

     by investor_id date;

     if inlog and not (inKill);

run;


View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: merge data in order to get rid of observations

Posted in reply to caveman529

1. Are they both sorted by the variables of interest? This is for the options of "merge"

2. Does your kill_list only have those two columns or have a lot more other variables irrelevant to your current interest?  this is for the possibility of Hash().

Haikuo

Regular Contributor
Posts: 161

Re: merge data in order to get rid of observations

Thank you for your suggestions!

The kill_list contains nothing but the investor_id and date because these are the only two criteria for deletion.  The log (big dataset) is sorted by investor_id, date, and time variable.  I should add that there is a time variable I didn't put in the original question.  But I'll update it....

Respected Advisor
Posts: 3,156

Re: merge data in order to get rid of observations

Posted in reply to caveman529

In that case, Hash() can't really help you unless you have a very big RAM > 25G. So try 's suggestion.

Good luck,

Haikuo

Regular Contributor
Posts: 161

Re: merge data in order to get rid of observations

Thank you all for your help.  Ed's program runs without a problem.  It is now completed already - Smiley Happy

Solution
‎06-06-2013 03:33 PM
Super User
Posts: 11,343

Re: merge data in order to get rid of observations

Posted in reply to caveman529

With the kill list sorted by id and date the approach with merge would be something like

Data want ;

     merge log (in=inlog) kill (In=inKill);

     by investor_id date;

     if inlog and not (inKill);

run;


Regular Contributor
Posts: 161

Re: merge data in order to get rid of observations

I'll give it a try right now.  Thank you!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 270 views
  • 6 likes
  • 3 in conversation