BookmarkSubscribeRSS Feed
dee_arr
Fluorite | Level 6

Good afternoon, everyone,

 

After reviewing the many entries regarding filling in missing values, I could not find anything that would help me with my conundrum. This is a fairly specific case of a more general problem of filling in missing observations in a panel, so bear with me and thank you for any help you might be able to provide.

 

I am working with loan data and am trying to account for an instance where many loans are not reported for a specific lender, for several periods, but different number of periods depending on the loan.

 

Here is the detail (data sample at the bottom of the post):

 

1) Data is sorted by consumer ID, then loan ID, then quarter.

2) Different loans belong to different lenders, who have lender IDs associated with them.

3) The panel is not balanced, as the loans span different time periods and there are occasional missing values.

4) I know that a specific lender, in the example below "Purple" had a good chunk (but not all) of their loans go missing in Q1-2011, and the loans trickled back in over the next four quarters, with virtually all of them returning by Q1-2012. For example, loan Z is missing for four quarters, loan B for two quarters, and loan C for one quarter. Loan Y, on the other hand, is reported every period during this time.

5) Occasionally, the loan ID changes, or the loan is transferred to a different lender - e.g. loan C was with lender Purple before it went missing in Q1-2011, and with lender Pink once it came back. I do NOT want to fill in rows in those cases.

6) I also do not want to fill in for any other loans/lenders, or different time periods for lender Purple, e.g. loan X with lender Blue is missing for one quarter, and I want to leave that as is.

7) For inserted rows, I want to fill in missing values with the values from Q4-2010, before the loans went missing.

😎 I have a large number of numeric and string variables in addition to the panel identifiers.

 

If the loans were missing for the same amount of time, and if the loan IDs and lenders on the loans did not change, this would be a relatively simple problem. But these two issues are tripping me up.

 

I have looked into the EXPAND procedure, but it does not appear to do what I would expect.

 

If you have read this far, THANK YOU! I would appreciate any advice the group may be able to offer.

 

Below is the sample data:

 

Client Loan Lender Quarter NumVar StringVar
123 X Blue Q3-2010 3 Summer
123 X Blue Q4-2010 7 Winter
123 X Blue Q1-2011 34 Fall
123 X Blue Q3-2011 7 Summer
123 X Blue Q4-2011 3 Winter
123 X Blue Q1-2012 5 Fall
123 X Blue Q2-2012 2 Spring
123 X Blue Q3-2012 6 Summer
123 X Blue Q4-2012 5 Winter
123 Y Purple Q1-2010 6 Fall
123 Y Purple Q2-2010 3 Spring
123 Y Purple Q3-2010 5 Summer
123 Y Purple Q4-2010 6 Winter
123 Y Purple Q1-2011 3 Fall
123 Y Purple Q2-2011 7 Spring
123 Y Purple Q3-2011 53 Summer
123 Y Purple Q4-2011 5 Winter
123 Y Purple Q1-2012 7 Fall
123 Y Purple Q2-2012 3 Spring
123 Z Purple Q1-2010 6 Fall
123 Z Purple Q2-2010 5 Spring
123 Z Purple Q3-2010 6 Summer
123 Z Purple Q4-2010 3 Winter
123 Z Purple Q1-2012 55 Fall
123 Z Purple Q2-2012 5 Spring
123 Z Purple Q3-2012 7 Summer
123 Z Purple Q4-2012 3 Winter
456 A Green Q4-2010 5 Winter
456 A Green Q1-2011 6 Fall
456 A Green Q2-2011 3 Spring
456 A Green Q3-2011 5 Summer
456 A Green Q4-2011 6 Winter
456 A Green Q1-2012 3 Fall
456 A Green Q2-2012 7 Spring
456 A Green Q3-2012 23 Summer
456 A Green Q4-2012 5 Winter
456 B Purple Q1-2010 7 Fall
456 B Purple Q2-2010 3 Spring
456 B Purple Q3-2010 5 Summer
456 B Purple Q4-2010 2 Winter
456 B Purple Q3-2011 6 Summer
456 B Purple Q4-2011 3 Winter
456 B Purple Q1-2012 5 Fall
456 B Purple Q2-2012 6 Spring
456 B Purple Q3-2012 3 Summer
456 B Purple Q4-2012 7 Winter
456 C Purple Q1-2010 17 Fall
456 C Purple Q2-2010 5 Spring
456 C Purple Q3-2010 7 Summer
456 C Purple Q4-2010 3 Winter
456 C Pink Q2-2011 2 Spring
456 C Pink Q3-2011 6 Summer
456 C Pink Q4-2011 5 Winter
456 C Pink Q1-2012 6 Fall
456 C Pink Q2-2012 3 Spring
456 C Pink Q3-2012 4 Summer

5 REPLIES 5
ballardw
Super User

It helps to provide input data and the expected output for that given example data. Which often means fewer records and only enough to demonstrate the needed use case(s). Best is in the form of datastep code pasted into a code box opened with the forums {I} icon so that we have examples of your data to test with code.

 

Likely EXPAND does not work as I am guessing that your "quarter" variable is text as the displayed value you show would be a non-standard format for SAS dates. Is that the case?

Expand likes to use actual DATE values.

As do the functions that do things like increment dates by a quarter.

 

Do the "large number of numeric and string variables in addition to the panel identifiers." also have to be carried forward (think that is what you are requesting"

 

dee_arr
Fluorite | Level 6

Yes, all of the variables would have to be carried forward - there are about 30 of them.

 

The quarter is in date format, actually... I created the data sample from scratch to illustrate the many different cases where I do/don't need to fill in, and the original data is super confidential so I had to give a hypothetical example instead. My sincere apologies if that made it more difficult to provide assistance.

 

Let me work on code I can paste in, and I will add to the post.

ballardw
Super User

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

You can modify your original data set to replace sensitive values and reduce the cases (and the carry along variables) prior to using the macro.

dee_arr
Fluorite | Level 6
Below is the data straight from SAS; quarter is a date, all the IDs are strings.

My apologies for all the renaming, but lender names and all the string values are confidential, as are the names of some of the variables, so I have had to replace.

I tried to reduce the number of cases, but there are several variants of the changes I am facing so I tried to illustrate with the examples.

For ease of reference, here is my conundrum: I am working with loan data and am trying to account for an instance where many loans are not reported for a specific lender, for several periods, but different number of periods depending on the loan.
 
1) Data is sorted by consumer ID, then loan ID, then quarter.
2) Different loans belong to different lenders, who have lender IDs associated with them.
3) The panel is not balanced, as the loans span different time periods and there are occasional missing values.
4) I know that a specific lender, in the example below "Purple" had a good chunk (but not all) of their loans go missing in Q1-2011, and the loans trickled back in over the next four quarters, with virtually all of them returning by Q1-2012. For example, loan Z is missing for four quarters, loan B for two quarters, and loan C for one quarter. Loan Y, on the other hand, is reported every period during this time.
5) Occasionally, the loan ID changes, or the loan is transferred to a different lender - e.g. loan C was with lender Purple before it went missing in Q1-2011, and with lender Pink once it came back. I do NOT want to fill in rows in those cases.
6) I also do not want to fill in for any other loans/lenders, or different time periods for lender Purple, e.g. loan X with lender Blue is missing for one quarter, and I want to leave that as is.
7) For inserted rows, I want to fill in missing values with the values from Q4-2010, before the loans went missing.

Would reformatting the data in any additional ways be helpful?

Thank you for any insights you might be able to provide!

data WORK.IMPORT; infile datalines dsd truncover; input Client:$18. Loan:$17. Lender:$6. Quarter:MMDDYY10. NumVar:BEST12. StringVar:$6.; format Quarter MMDDYY10. NumVar BEST12.; datalines; 00C3A26416BBD01F81 029VF12376VU Blue 09/01/2010 32487 Summer 00C3A26416BBD01F81 029VF12376VU Blue 12/01/2010 12219 Winter 00C3A26416BBD01F81 029VF12376VU Blue 03/01/2011 12219 Fall 00C3A26416BBD01F81 029VF12376VU Blue 09/01/2011 12219 Summer 00C3A26416BBD01F81 029VF12376VU Blue 12/01/2011 48718 Winter 00C3A26416BBD01F81 029VF12376VU Blue 03/01/2012 27718 Fall 00C3A26416BBD01F81 029VF12376VU Blue 06/01/2012 27718 Spring 00C3A26416BBD01F81 029VF12376VU Blue 09/01/2012 63266 Summer 00C3A26416BBD01F81 029VF12376VU Blue 12/01/2012 63298 Winter 00C3A26416BBD01F81 HJ-79991272075810 Purple 03/01/2010 63423 Fall 00C3A26416BBD01F81 HJ-79991272075810 Purple 06/01/2010 63423 Spring 00C3A26416BBD01F81 HJ-79991272075810 Purple 09/01/2010 63116 Summer 00C3A26416BBD01F81 HJ-79991272075810 Purple 12/01/2010 16074 Winter 00C3A26416BBD01F81 HJ-79991272075810 Purple 03/01/2011 16074 Fall 00C3A26416BBD01F81 HJ-79991272075810 Purple 06/01/2011 16074 Spring 00C3A26416BBD01F81 HJ-79991272075810 Purple 09/01/2011 . Summer 00C3A26416BBD01F81 HJ-79991272075810 Purple 12/01/2011 16074 Winter 00C3A26416BBD01F81 HJ-79991272075810 Purple 03/01/2012 16074 Fall 00C3A26416BBD01F81 HJ-79991272075810 Purple 06/01/2012 16074 Spring 00C3A26416BBD01F81 14584QP34577 Purple 03/01/2010 29905 Fall 00C3A26416BBD01F81 14584QP34577 Purple 06/01/2010 41400 Spring 00C3A26416BBD01F81 14584QP34577 Purple 09/01/2010 40400 Summer 00C3A26416BBD01F81 14584QP34577 Purple 12/01/2010 40400 Winter 00C3A26416BBD01F81 14584QP34577 Purple 03/01/2012 40400 Fall 00C3A26416BBD01F81 14584QP34577 Purple 06/01/2012 40400 Spring 00C3A26416BBD01F81 14584QP34577 Purple 09/01/2012 32487 Summer 00C3A26416BBD01F81 14584QP34577 Purple 12/01/2012 12219 Winter 01A48C760A07DEA200 21056587 Green 12/01/2010 12219 Winter 01A48C760A07DEA200 21056587 Green 03/01/2011 12219 Fall 01A48C760A07DEA200 21056587 Green 06/01/2011 48718 Spring 01A48C760A07DEA200 21056587 Green 09/01/2011 27718 Summer 01A48C760A07DEA200 21056587 Green 12/01/2011 27718 Winter 01A48C760A07DEA200 21056587 Green 03/01/2012 63266 Fall 01A48C760A07DEA200 21056587 Green 06/01/2012 63298 Spring 01A48C760A07DEA200 21056587 Green 09/01/2012 63423 Summer 01A48C760A07DEA200 21056587 Green 12/01/2012 63423 Winter 01A48C760A07DEA200 11672365903335X Purple 03/01/2010 63116 Fall 01A48C760A07DEA200 11672365903335X Purple 06/01/2010 16074 Spring 01A48C760A07DEA200 11672365903335X Purple 09/01/2010 16074 Summer 01A48C760A07DEA200 11672365903335X Purple 12/01/2010 16074 Winter 01A48C760A07DEA200 11672365903335X Purple 09/01/2011 . Summer 01A48C760A07DEA200 11672365903335X Purple 12/01/2011 16074 Winter 01A48C760A07DEA200 11672365903335X Purple 03/01/2012 16074 Fall 01A48C760A07DEA200 11672365903335X Purple 06/01/2012 16074 Spring 01A48C760A07DEA200 11672365903335X Purple 09/01/2012 29905 Summer 01A48C760A07DEA200 11672365903335X Purple 12/01/2012 41400 Winter 01A48C760A07DEA200 291FZ Purple 03/01/2010 40400 Fall 01A48C760A07DEA200 291FZ Purple 06/01/2010 40400 Spring 01A48C760A07DEA200 291FZ Purple 09/01/2010 40400 Summer 01A48C760A07DEA200 291FZ Purple 12/01/2010 40400 Winter 01A48C760A07DEA200 291FZ Pink 06/01/2011 32487 Spring 01A48C760A07DEA200 291FZ Pink 09/01/2011 12219 Summer 01A48C760A07DEA200 291FZ Pink 12/01/2011 12219 Winter 01A48C760A07DEA200 291FZ Pink 03/01/2012 12219 Fall 01A48C760A07DEA200 291FZ Pink 06/01/2012 48718 Spring 01A48C760A07DEA200 291FZ Pink 09/01/2012 27718 Summer ;;;;
dee_arr
Fluorite | Level 6

I was not able to figure this out and outsourced the "filling in" to STATA, but would *love* to be able to do it within my SAS program and not have to go back and forth between the programs! If anyone has insights into how I could pull it off, I would really really appreciate it.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 590 views
  • 2 likes
  • 2 in conversation