BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Steelers_In_DC
Barite | Level 11

I have a large number of edits/rules that are provided by the federal reserve that I run against our mortgage data.  Several (38) sub totals are required to get the correct percentage.  In order to label these different threshold I added a new column for each of the 38 like so:

 

if bankruptcy_flag = 'Y' then do;
_df4 = "df4";
'_54_Bankruptcy Loans Only'n = bankruptcy_chapter not in('7','9','11','12','13','U');
end;

if investor_type = '7' then do;
_df5="df5";
if com_loan_flg ne 1 and paid_in_full_coding = 0 and missing(baselII_pd) or
   com_loan_flg in (1,.) and paid_in_full_coding = 0 and not missing(baselII_pd) and baselII_pd<.0003 or
   baselII_pd>1 then
'_125.1_Basel II BHC Portfolio'n = 1;

 

Later in the process I join each subtotal like this:

 

select _df1 as _df,count(loan_number) as Count
from First_Lien_Rules_ETL_&propdate
where _df1 = "df1"
union
select _df2 as _df,count(loan_number) as count
from First_Lien_Rules_ETL_&propdate
where _df2 = 'df2'
union
select _df3 as _df,count(loan_number) as count
from First_Lien_Rules_ETL_&propdate
where _df3 = 'df3'
union

 

the _df field will give me the correct total, so I can look at the amount of fails/subtotal = percentage fail. 

 

Within my experience this is the best solution I came up with.  The entire process takes under 30 minutes but this step takes over 10.  I'd love to get that time down.  Can anyone give me suggestions for another solution to the same output?

 

Thank You,

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

It is not clear from your description whether your _df condtions are mutually exclusive (one account can only have one _df) but from the way you have coded it one account can have multiple _dfs.

 

Using an array you could have running totals in the same step where you define _dfs and avoid the following SQL step altogether. You would just need an array for all _df counts then have logic like this:

 

array _df_counts (*) _df_count1 - _df_count38;

if bankruptcy_flag = 'Y' then do;
  _df4 = 'df4';
  _df_counts(4) +1;
 ......
end;

  

View solution in original post

2 REPLIES 2
SASKiwi
PROC Star

It is not clear from your description whether your _df condtions are mutually exclusive (one account can only have one _df) but from the way you have coded it one account can have multiple _dfs.

 

Using an array you could have running totals in the same step where you define _dfs and avoid the following SQL step altogether. You would just need an array for all _df counts then have logic like this:

 

array _df_counts (*) _df_count1 - _df_count38;

if bankruptcy_flag = 'Y' then do;
  _df4 = 'df4';
  _df_counts(4) +1;
 ......
end;

  

ballardw
Super User

If you could provide complete details for maybe 4 or 5 of the rules and a dataset (data step please) that would exercise all of the rules and the desired output we might be able to give you more detailed coding support.

 

Often when I see "one datset per variable" types of solutions often a data re-arrangement can help.

 

For instance, this code:

select _df2 as _df,count(loan_number) as count
from First_Lien_Rules_ETL_&propdate
where _df2 = 'df2'

 

looks like if you had a numeric variable that had values of 0 and 1 with 1 the condition of interest then SUMing it would give the desired count. And you can usually create that type of flag with _df = (a > b) or some other logical.

And then a procedure like Proc Means, Summary, Report or Tabulate can go a long way getting other things. (Mean of a 0/1 variable is percent as decimal in many cases if defined correctly).

 

Here is a very simple example but the principal may work for your. Note the difference in how OnlyYoungFemale is defined and the effect on the percentage as the conditional assignment changes the denominator.

data junk;
   set sashelp.class;
   IsFemale = (Sex='F');
   IsYoungFemale= (IsFemale and (Age le 14));
   If IsFemale then OnlyYoungFemale = (Age le 14);
   IsTallYoungFemale = (IsYoungFemale and (Height ge 64));
run;

Proc Tabulate data=junk;
   var IsFemale IsYoungFemale OnlyYoungFemale IsTallYoungFemale;
   table IsFemale IsYoungFemale IsTallYoungFemale OnlyYoungFemale,
         Sum='Count'*f=best4. Mean='%'*f=percent8.1;
   label
      IsFemale ='Females'
      IsYoungFemale ="Young Females of pop"
      OnlyYoungFemale= "Young of Females"
      IsTallYoungFemale="Young Tall Females of Pop"
   ;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 727 views
  • 0 likes
  • 3 in conversation