Hi there,
My question is how to find the nearest match in the same column. Please see the following for a quick illustration. I am new to SAS programming, and only have a preliminary guess that proc sql might do the work. What I am doing now is manually adjusting - it is painful and especially so for over 3,000 observations.
Firm | Year | Quarter | Value |
AA | 2000 | 1 | -100 |
AA | 2000 | 2 | 50 |
AA | 2000 | 3 | 50 |
AA | 2000 | 4 | 0 |
AA | 2001 | 1 | 0 |
AA | 2001 | 2 | -50 |
AA | 2001 | 3 | 100 |
AA | 2001 | 4 | -50 |
AA | 2002 | 1 | 0 |
AA | 2002 | 2 | 50 |
AA | 2002 | 3 | -150 |
AA | 2002 | 4 | 100 |
BB | 2000 | 1 | 0 |
BB | 2000 | 2 | 0 |
BB | 2000 | 3 | 0 |
BB | 2000 | 4 | 50 |
BB | 2001 | 1 | 150 |
BB | 2001 | 2 | 0 |
BB | 2001 | 3 | -100 |
BB | 2001 | 4 | -100 |
I want to find the nearest "Value" match that could add up to zero. For example, for firm AA in 1st quarter 2000, I want to match the nearest two numbers that could add up to 100. I don't want the 50 for firm AA in 2002Q2 nor firm BB 2000Q4. In addition, I also struggle with the case for firm BB, and have no idea how to perform the matching: the two negative numbers add up to -200, the two positive numbers add up to +200, and they maybe in same or different years. To help you understand better, please find the following table for what I have in mind at the end of the day:
Firm | Year | Quarter | Value | Value_matched | YQ_matched |
AA | 2000 | 1 | -100 | 50 | 2000Q2 |
AA | 2000 | 1 | -100 | 50 | 2000Q3 |
AA | 2001 | 2 | -50 | 100 | 2001Q3 |
AA | 2001 | 4 | -50 | 100 | 2001Q3 |
AA | 2002 | 3 | -150 | 50 | 2002Q2 |
AA | 2002 | 3 | -150 | 100 | 2002Q4 |
BB | 2001 | 3 | -100 | 50 | 2000Q4 |
BB | 2001 | 4 | -100 | 100 | 2001Q1 |
For the BB case, it can be 2001Q3 "-100" matched to "50" in 2000Q4, it is also fine if it matches to "100" in 2001Q1 - the order doesn't matter. Thanks in advance! Any help is really appreciated!
Regards,
Michael
You are missing some details.
Your output has
AA | 2000 | 1 | -100 | 50 | 2000Q2 |
AA | 2000 | 1 | -100 | 50 | 2000Q3 |
Why aren't any of the other quarters for AA 2000 included?
why isn't
AA | 2000 | 1 | -100 |
Matched to
AA | 2001 | 3 | 100 |
where the sum would be exactly 0?
There appear to be several other limitations that you have not described.
Sorry for the confusion. Let me explain:
The one you posted is not a match for me because I would like "nearest" match. By "nearest", I mean within +/- two quarters for a given firm, so 2000Q1 shouldn't be related to 2001Q3 - but your idea is exactly right, the following should also be a valid match (which is missing from the original post)
AA 2001 4 -50 0 2002Q1
AA 2001 4 -50 50 2002Q2
Another clarification is about the case of firm BB - they don't have common entry as the other matches. The reason I post it here is that I have no idea how I should deal with such case, and any ideas would be extremely appreciated. The original data is about cash flows - a story would be that I earn 50 in 2000Q4 and 100 in 2001Q1, and I spent 100 each in 2001Q3 and 2001Q4. I manually checked and the data is consistent with my story.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.