Hi there,
My question is how to find the nearest match in the same column. Please see the following for a quick illustration. I am new to SAS programming, and only have a preliminary guess that proc sql might do the work. What I am doing now is manually adjusting - it is painful and especially so for over 3,000 observations.
Firm | Year | Quarter | Value |
AA | 2000 | 1 | -100 |
AA | 2000 | 2 | 50 |
AA | 2000 | 3 | 50 |
AA | 2000 | 4 | 0 |
AA | 2001 | 1 | 0 |
AA | 2001 | 2 | -50 |
AA | 2001 | 3 | 100 |
AA | 2001 | 4 | -50 |
AA | 2002 | 1 | 0 |
AA | 2002 | 2 | 50 |
AA | 2002 | 3 | -150 |
AA | 2002 | 4 | 100 |
BB | 2000 | 1 | 0 |
BB | 2000 | 2 | 0 |
BB | 2000 | 3 | 0 |
BB | 2000 | 4 | 50 |
BB | 2001 | 1 | 150 |
BB | 2001 | 2 | 0 |
BB | 2001 | 3 | -100 |
BB | 2001 | 4 | -100 |
I want to find the nearest "Value" match that could add up to zero. For example, for firm AA in 1st quarter 2000, I want to match the nearest two numbers that could add up to 100. I don't want the 50 for firm AA in 2002Q2 nor firm BB 2000Q4. In addition, I also struggle with the case for firm BB, and have no idea how to perform the matching: the two negative numbers add up to -200, the two positive numbers add up to +200, and they maybe in same or different years. To help you understand better, please find the following table for what I have in mind at the end of the day:
Firm | Year | Quarter | Value | Value_matched | YQ_matched |
AA | 2000 | 1 | -100 | 50 | 2000Q2 |
AA | 2000 | 1 | -100 | 50 | 2000Q3 |
AA | 2001 | 2 | -50 | 100 | 2001Q3 |
AA | 2001 | 4 | -50 | 100 | 2001Q3 |
AA | 2002 | 3 | -150 | 50 | 2002Q2 |
AA | 2002 | 3 | -150 | 100 | 2002Q4 |
BB | 2001 | 3 | -100 | 50 | 2000Q4 |
BB | 2001 | 4 | -100 | 100 | 2001Q1 |
For the BB case, it can be 2001Q3 "-100" matched to "50" in 2000Q4, it is also fine if it matches to "100" in 2001Q1 - the order doesn't matter. Thanks in advance! Any help is really appreciated!
Regards,
Michael
You are missing some details.
Your output has
AA | 2000 | 1 | -100 | 50 | 2000Q2 |
AA | 2000 | 1 | -100 | 50 | 2000Q3 |
Why aren't any of the other quarters for AA 2000 included?
why isn't
AA | 2000 | 1 | -100 |
Matched to
AA | 2001 | 3 | 100 |
where the sum would be exactly 0?
There appear to be several other limitations that you have not described.
Sorry for the confusion. Let me explain:
The one you posted is not a match for me because I would like "nearest" match. By "nearest", I mean within +/- two quarters for a given firm, so 2000Q1 shouldn't be related to 2001Q3 - but your idea is exactly right, the following should also be a valid match (which is missing from the original post)
AA 2001 4 -50 0 2002Q1
AA 2001 4 -50 50 2002Q2
Another clarification is about the case of firm BB - they don't have common entry as the other matches. The reason I post it here is that I have no idea how I should deal with such case, and any ideas would be extremely appreciated. The original data is about cash flows - a story would be that I earn 50 in 2000Q4 and 100 in 2001Q1, and I spent 100 each in 2001Q3 and 2001Q4. I manually checked and the data is consistent with my story.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.