Hi all,
I am trying to count (REL_COUNT) each individual occurence that a customer has a relation (LN_ACCOUNT) as either a borrower (PRI_SSN) or co-borrower (SEC_SSN) and then apply a flag (0,1) if they are a multiple relationship customer (MUTL_REL). My code below works only at comparing the borrower for each loan but not if the borrower is also listed as co-borrower (SEC_SSN) on another loan. For instance using the sample below the REL_COUNT variable would return the folloing for obs 1-6 (1; 1; 2; 1; 1; 1) but my desried out come would be for the REL_COUNT variable to return the following for obs 1-6 (1; 1; 2; 1; 1; 2). Is this at all possible?
DATA TEST;
INPUT PRI_SSN $ SEC_SSN $ LN_ACCOUNT $ ENTRD_DT;
DATALINES;
29842 29843 0081 12JUL2011
40011 45074 9374 16NOV2012
40011 78118 4970 21SEP2013
41872 21671 9459 05FEB2008
60052 80010 6749 13AUG2013
80010 60052 6231 01JAN2014;
RUN;
PROC SORT DATA=TEST;
BY PRI_SSN SEC_SSN ENTRD_DT;
RUN;
DATA B;
SET TEST;
BY PRI_SSN;
MULT_REL = 1;
IF FIRST.PRI_SSN THEN REL_COUNT = 1;
ELSE DO;
IF PRI_SSN = PRI_SSN OR PR_SSN = SEC_SSN THEN REL_COUNT + 1;
ELSE REL_COUNT = 1;
END;
IF LAST.PRI_SSN THEN DO;
IF REL_COUNT > 1 THEN MULT_REL = 1;
ELSE MULT_REL = 0;
END;
RUN;
Thank you,
Jacob
810K isn't an issue to flip, in my opinion.
Here's a quick idea. Then its easy to enumerate, and easier to run proc freqs and proc sql queries against.
DATA TEST;
informat entrd_dt date9.;
format entrd_dt date9.;
INPUT PRI_SSN $ SEC_SSN $ LN_ACCOUNT $ ENTRD_DT;
DATALINES;
29842 29843 0081 12JUL2011
40011 45074 9374 16NOV2012
40011 78118 4970 21SEP2013
41872 21671 9459 05FEB2008
60052 80010 6749 13AUG2013
80010 60052 6231 01JAN2014
;
RUN;
data flip;
length type $12.;
set test;
Account=pri_ssn;
type="Primary";
output;
Account=sec_ssn;
type="Secondary";
output;
keep account type ln_account entrd_dt;
run;
proc sort data=flip;
by account type;
run;
Change your data structure perhaps?
Depending on how big your data is this may or may not be a good idea. My concern with your current solution/example is that it depends on the rows being near each other and I can't see that as being always true.
SSN STATUS (primary or secndary) Load Date.
Thanks for looking Reeza. You are correct, the rows are not always near each other meaning I can't do a simple lag or lead. My data contains about 810,000 observations so I'm know sure how to best restructure. I'm going to try some data set slicing i think to break the data down and approach it that way.
Any other help would be appreciated.
Thanks,
Jacob
810K isn't an issue to flip, in my opinion.
Here's a quick idea. Then its easy to enumerate, and easier to run proc freqs and proc sql queries against.
DATA TEST;
informat entrd_dt date9.;
format entrd_dt date9.;
INPUT PRI_SSN $ SEC_SSN $ LN_ACCOUNT $ ENTRD_DT;
DATALINES;
29842 29843 0081 12JUL2011
40011 45074 9374 16NOV2012
40011 78118 4970 21SEP2013
41872 21671 9459 05FEB2008
60052 80010 6749 13AUG2013
80010 60052 6231 01JAN2014
;
RUN;
data flip;
length type $12.;
set test;
Account=pri_ssn;
type="Primary";
output;
Account=sec_ssn;
type="Secondary";
output;
keep account type ln_account entrd_dt;
run;
proc sort data=flip;
by account type;
run;
I follow you. I can then make my counts for borrower or co-borrower based on TYPE and bringit all together. I think I can get it from here.
Thanks Reeza.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.