BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
caveman529
Calcite | Level 5

Hi, everybody, I have a newbie question:

I have a huge log with the layout as below.  The log file is sorted by investor_id, date, and time.

investor_iddatetimefile
A1/1/199502:22:03X
A1/2/199502:22:04X
B2/3/200102:19:11Y
C2/3/200118:12:11Y

and a kill list that I want to delete the observation in that list from the log:

investor_iddate
X2/3/2005
Y2/3/2006
Z3/3/2007
A4/1/2006
...
...

.

My question is how to merge log with the kill_list so that I can delete from log any observations on kill_list efficiently (merge criteria is on investor_di and date)?  So if there is any investors and date combo that appear on the kill_list, that observation would be deleted.  The thing is that the log is very big, over 130 GB.  And the kill_list is not small also, around 25 GB.

Thank you so much!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

With the kill list sorted by id and date the approach with merge would be something like

Data want ;

     merge log (in=inlog) kill (In=inKill);

     by investor_id date;

     if inlog and not (inKill);

run;


View solution in original post

6 REPLIES 6
Haikuo
Onyx | Level 15

1. Are they both sorted by the variables of interest? This is for the options of "merge"

2. Does your kill_list only have those two columns or have a lot more other variables irrelevant to your current interest?  this is for the possibility of Hash().

Haikuo

caveman529
Calcite | Level 5

Thank you for your suggestions!

The kill_list contains nothing but the investor_id and date because these are the only two criteria for deletion.  The log (big dataset) is sorted by investor_id, date, and time variable.  I should add that there is a time variable I didn't put in the original question.  But I'll update it....

Haikuo
Onyx | Level 15

In that case, Hash() can't really help you unless you have a very big RAM > 25G. So try 's suggestion.

Good luck,

Haikuo

caveman529
Calcite | Level 5

Thank you all for your help.  Ed's program runs without a problem.  It is now completed already - Smiley Happy

ballardw
Super User

With the kill list sorted by id and date the approach with merge would be something like

Data want ;

     merge log (in=inlog) kill (In=inKill);

     by investor_id date;

     if inlog and not (inKill);

run;


caveman529
Calcite | Level 5

I'll give it a try right now.  Thank you!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 642 views
  • 6 likes
  • 3 in conversation