Hi all,
I am trying to count (REL_COUNT) each individual occurence that a customer has a relation (LN_ACCOUNT) as either a borrower (PRI_SSN) or co-borrower (SEC_SSN) and then apply a flag (0,1) if they are a multiple relationship customer (MUTL_REL). My code below works only at comparing the borrower for each loan but not if the borrower is also listed as co-borrower (SEC_SSN) on another loan. For instance using the sample below the REL_COUNT variable would return the folloing for obs 1-6 (1; 1; 2; 1; 1; 1) but my desried out come would be for the REL_COUNT variable to return the following for obs 1-6 (1; 1; 2; 1; 1; 2). Is this at all possible?
DATA TEST;
INPUT PRI_SSN $ SEC_SSN $ LN_ACCOUNT $ ENTRD_DT;
DATALINES;
29842 29843 0081 12JUL2011
40011 45074 9374 16NOV2012
40011 78118 4970 21SEP2013
41872 21671 9459 05FEB2008
60052 80010 6749 13AUG2013
80010 60052 6231 01JAN2014;
RUN;
PROC SORT DATA=TEST;
BY PRI_SSN SEC_SSN ENTRD_DT;
RUN;
DATA B;
SET TEST;
BY PRI_SSN;
MULT_REL = 1;
IF FIRST.PRI_SSN THEN REL_COUNT = 1;
ELSE DO;
IF PRI_SSN = PRI_SSN OR PR_SSN = SEC_SSN THEN REL_COUNT + 1;
ELSE REL_COUNT = 1;
END;
IF LAST.PRI_SSN THEN DO;
IF REL_COUNT > 1 THEN MULT_REL = 1;
ELSE MULT_REL = 0;
END;
RUN;
Thank you,
Jacob
810K isn't an issue to flip, in my opinion.
Here's a quick idea. Then its easy to enumerate, and easier to run proc freqs and proc sql queries against.
DATA TEST;
informat entrd_dt date9.;
format entrd_dt date9.;
INPUT PRI_SSN $ SEC_SSN $ LN_ACCOUNT $ ENTRD_DT;
DATALINES;
29842 29843 0081 12JUL2011
40011 45074 9374 16NOV2012
40011 78118 4970 21SEP2013
41872 21671 9459 05FEB2008
60052 80010 6749 13AUG2013
80010 60052 6231 01JAN2014
;
RUN;
data flip;
length type $12.;
set test;
Account=pri_ssn;
type="Primary";
output;
Account=sec_ssn;
type="Secondary";
output;
keep account type ln_account entrd_dt;
run;
proc sort data=flip;
by account type;
run;
Change your data structure perhaps?
Depending on how big your data is this may or may not be a good idea. My concern with your current solution/example is that it depends on the rows being near each other and I can't see that as being always true.
SSN STATUS (primary or secndary) Load Date.
Thanks for looking Reeza. You are correct, the rows are not always near each other meaning I can't do a simple lag or lead. My data contains about 810,000 observations so I'm know sure how to best restructure. I'm going to try some data set slicing i think to break the data down and approach it that way.
Any other help would be appreciated.
Thanks,
Jacob
810K isn't an issue to flip, in my opinion.
Here's a quick idea. Then its easy to enumerate, and easier to run proc freqs and proc sql queries against.
DATA TEST;
informat entrd_dt date9.;
format entrd_dt date9.;
INPUT PRI_SSN $ SEC_SSN $ LN_ACCOUNT $ ENTRD_DT;
DATALINES;
29842 29843 0081 12JUL2011
40011 45074 9374 16NOV2012
40011 78118 4970 21SEP2013
41872 21671 9459 05FEB2008
60052 80010 6749 13AUG2013
80010 60052 6231 01JAN2014
;
RUN;
data flip;
length type $12.;
set test;
Account=pri_ssn;
type="Primary";
output;
Account=sec_ssn;
type="Secondary";
output;
keep account type ln_account entrd_dt;
run;
proc sort data=flip;
by account type;
run;
I follow you. I can then make my counts for borrower or co-borrower based on TYPE and bringit all together. I think I can get it from here.
Thanks Reeza.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.