Hi, everybody, I have a newbie question:
I have a huge log with the layout as below. The log file is sorted by investor_id, date, and time.
investor_id | date | time | file |
---|---|---|---|
A | 1/1/1995 | 02:22:03 | X |
A | 1/2/1995 | 02:22:04 | X |
B | 2/3/2001 | 02:19:11 | Y |
C | 2/3/2001 | 18:12:11 | Y |
and a kill list that I want to delete the observation in that list from the log:
investor_id | date |
---|---|
X | 2/3/2005 |
Y | 2/3/2006 |
Z | 3/3/2007 |
A | 4/1/2006 |
.. | . |
... | . |
My question is how to merge log with the kill_list so that I can delete from log any observations on kill_list efficiently (merge criteria is on investor_di and date)? So if there is any investors and date combo that appear on the kill_list, that observation would be deleted. The thing is that the log is very big, over 130 GB. And the kill_list is not small also, around 25 GB.
Thank you so much!
With the kill list sorted by id and date the approach with merge would be something like
Data want ;
merge log (in=inlog) kill (In=inKill);
by investor_id date;
if inlog and not (inKill);
run;
1. Are they both sorted by the variables of interest? This is for the options of "merge"
2. Does your kill_list only have those two columns or have a lot more other variables irrelevant to your current interest? this is for the possibility of Hash().
Haikuo
Thank you for your suggestions!
The kill_list contains nothing but the investor_id and date because these are the only two criteria for deletion. The log (big dataset) is sorted by investor_id, date, and time variable. I should add that there is a time variable I didn't put in the original question. But I'll update it....
Thank you all for your help. Ed's program runs without a problem. It is now completed already -
With the kill list sorted by id and date the approach with merge would be something like
Data want ;
merge log (in=inlog) kill (In=inKill);
by investor_id date;
if inlog and not (inKill);
run;
I'll give it a try right now. Thank you!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.