Hi, everybody, I have a newbie question:
I have a huge log with the layout as below. The log file is sorted by investor_id, date, and time.
investor_id | date | time | file |
---|---|---|---|
A | 1/1/1995 | 02:22:03 | X |
A | 1/2/1995 | 02:22:04 | X |
B | 2/3/2001 | 02:19:11 | Y |
C | 2/3/2001 | 18:12:11 | Y |
and a kill list that I want to delete the observation in that list from the log:
investor_id | date |
---|---|
X | 2/3/2005 |
Y | 2/3/2006 |
Z | 3/3/2007 |
A | 4/1/2006 |
.. | . |
... | . |
My question is how to merge log with the kill_list so that I can delete from log any observations on kill_list efficiently (merge criteria is on investor_di and date)? So if there is any investors and date combo that appear on the kill_list, that observation would be deleted. The thing is that the log is very big, over 130 GB. And the kill_list is not small also, around 25 GB.
Thank you so much!
With the kill list sorted by id and date the approach with merge would be something like
Data want ;
merge log (in=inlog) kill (In=inKill);
by investor_id date;
if inlog and not (inKill);
run;
1. Are they both sorted by the variables of interest? This is for the options of "merge"
2. Does your kill_list only have those two columns or have a lot more other variables irrelevant to your current interest? this is for the possibility of Hash().
Haikuo
Thank you for your suggestions!
The kill_list contains nothing but the investor_id and date because these are the only two criteria for deletion. The log (big dataset) is sorted by investor_id, date, and time variable. I should add that there is a time variable I didn't put in the original question. But I'll update it....
Thank you all for your help. Ed's program runs without a problem. It is now completed already -
With the kill list sorted by id and date the approach with merge would be something like
Data want ;
merge log (in=inlog) kill (In=inKill);
by investor_id date;
if inlog and not (inKill);
run;
I'll give it a try right now. Thank you!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.