Hi all,
I am trying to count (REL_COUNT) each individual occurence that a customer has a relation (LN_ACCOUNT) as either a borrower (PRI_SSN) or co-borrower (SEC_SSN) and then apply a flag (0,1) if they are a multiple relationship customer (MUTL_REL). My code below works only at comparing the borrower for each loan but not if the borrower is also listed as co-borrower (SEC_SSN) on another loan. For instance using the sample below the REL_COUNT variable would return the folloing for obs 1-6 (1; 1; 2; 1; 1; 1) but my desried out come would be for the REL_COUNT variable to return the following for obs 1-6 (1; 1; 2; 1; 1; 2). Is this at all possible?
DATA TEST;
INPUT PRI_SSN $ SEC_SSN $ LN_ACCOUNT $ ENTRD_DT;
DATALINES;
29842 29843 0081 12JUL2011
40011 45074 9374 16NOV2012
40011 78118 4970 21SEP2013
41872 21671 9459 05FEB2008
60052 80010 6749 13AUG2013
80010 60052 6231 01JAN2014;
RUN;
PROC SORT DATA=TEST;
BY PRI_SSN SEC_SSN ENTRD_DT;
RUN;
DATA B;
SET TEST;
BY PRI_SSN;
MULT_REL = 1;
IF FIRST.PRI_SSN THEN REL_COUNT = 1;
ELSE DO;
IF PRI_SSN = PRI_SSN OR PR_SSN = SEC_SSN THEN REL_COUNT + 1;
ELSE REL_COUNT = 1;
END;
IF LAST.PRI_SSN THEN DO;
IF REL_COUNT > 1 THEN MULT_REL = 1;
ELSE MULT_REL = 0;
END;
RUN;
Thank you,
Jacob
810K isn't an issue to flip, in my opinion.
Here's a quick idea. Then its easy to enumerate, and easier to run proc freqs and proc sql queries against.
DATA TEST;
informat entrd_dt date9.;
format entrd_dt date9.;
INPUT PRI_SSN $ SEC_SSN $ LN_ACCOUNT $ ENTRD_DT;
DATALINES;
29842 29843 0081 12JUL2011
40011 45074 9374 16NOV2012
40011 78118 4970 21SEP2013
41872 21671 9459 05FEB2008
60052 80010 6749 13AUG2013
80010 60052 6231 01JAN2014
;
RUN;
data flip;
length type $12.;
set test;
Account=pri_ssn;
type="Primary";
output;
Account=sec_ssn;
type="Secondary";
output;
keep account type ln_account entrd_dt;
run;
proc sort data=flip;
by account type;
run;
Change your data structure perhaps?
Depending on how big your data is this may or may not be a good idea. My concern with your current solution/example is that it depends on the rows being near each other and I can't see that as being always true.
SSN STATUS (primary or secndary) Load Date.
Thanks for looking Reeza. You are correct, the rows are not always near each other meaning I can't do a simple lag or lead. My data contains about 810,000 observations so I'm know sure how to best restructure. I'm going to try some data set slicing i think to break the data down and approach it that way.
Any other help would be appreciated.
Thanks,
Jacob
810K isn't an issue to flip, in my opinion.
Here's a quick idea. Then its easy to enumerate, and easier to run proc freqs and proc sql queries against.
DATA TEST;
informat entrd_dt date9.;
format entrd_dt date9.;
INPUT PRI_SSN $ SEC_SSN $ LN_ACCOUNT $ ENTRD_DT;
DATALINES;
29842 29843 0081 12JUL2011
40011 45074 9374 16NOV2012
40011 78118 4970 21SEP2013
41872 21671 9459 05FEB2008
60052 80010 6749 13AUG2013
80010 60052 6231 01JAN2014
;
RUN;
data flip;
length type $12.;
set test;
Account=pri_ssn;
type="Primary";
output;
Account=sec_ssn;
type="Secondary";
output;
keep account type ln_account entrd_dt;
run;
proc sort data=flip;
by account type;
run;
I follow you. I can then make my counts for borrower or co-borrower based on TYPE and bringit all together. I think I can get it from here.
Thanks Reeza.
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.