BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
caveman529
Calcite | Level 5

Hi, everybody, I have a newbie question:

I have a huge log with the layout as below.  The log file is sorted by investor_id, date, and time.

investor_iddatetimefile
A1/1/199502:22:03X
A1/2/199502:22:04X
B2/3/200102:19:11Y
C2/3/200118:12:11Y

and a kill list that I want to delete the observation in that list from the log:

investor_iddate
X2/3/2005
Y2/3/2006
Z3/3/2007
A4/1/2006
...
...

.

My question is how to merge log with the kill_list so that I can delete from log any observations on kill_list efficiently (merge criteria is on investor_di and date)?  So if there is any investors and date combo that appear on the kill_list, that observation would be deleted.  The thing is that the log is very big, over 130 GB.  And the kill_list is not small also, around 25 GB.

Thank you so much!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

With the kill list sorted by id and date the approach with merge would be something like

Data want ;

     merge log (in=inlog) kill (In=inKill);

     by investor_id date;

     if inlog and not (inKill);

run;


View solution in original post

6 REPLIES 6
Haikuo
Onyx | Level 15

1. Are they both sorted by the variables of interest? This is for the options of "merge"

2. Does your kill_list only have those two columns or have a lot more other variables irrelevant to your current interest?  this is for the possibility of Hash().

Haikuo

caveman529
Calcite | Level 5

Thank you for your suggestions!

The kill_list contains nothing but the investor_id and date because these are the only two criteria for deletion.  The log (big dataset) is sorted by investor_id, date, and time variable.  I should add that there is a time variable I didn't put in the original question.  But I'll update it....

Haikuo
Onyx | Level 15

In that case, Hash() can't really help you unless you have a very big RAM > 25G. So try 's suggestion.

Good luck,

Haikuo

caveman529
Calcite | Level 5

Thank you all for your help.  Ed's program runs without a problem.  It is now completed already - Smiley Happy

ballardw
Super User

With the kill list sorted by id and date the approach with merge would be something like

Data want ;

     merge log (in=inlog) kill (In=inKill);

     by investor_id date;

     if inlog and not (inKill);

run;


caveman529
Calcite | Level 5

I'll give it a try right now.  Thank you!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1647 views
  • 6 likes
  • 3 in conversation