- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have written a Proc Sql statement that does exactly what I want it to do: generate a new variable in a summation statement grouped by two variables. Here is the code:
Proc Sql;
create table Valuation2016 as select
fin_status,
exposure_bin,
sum(npv_1)/sum(end_month_bal) as per_npv
from companies_all (where=((flag=1) and (eff_loan ne .)))
group by fin_status,
exposure_bin
;
Quit;
The only problem is that, some of the time, my dataset contains no values for intersection of npv_1 and end_month_bal grouped by fin_status and exposure bin. For example, next month I may have a dataset that has no observations when fin_status=2 and exposure_bin="100%".
Question: In this case what I want to do is tell SAS that if it encounters a blank when grouped by fin_status and exposure_bin, to move on to the the total npv_1/end_month_bal for that that exposure_bin group.
For example, if fin_status=1 and exposure_bin=100% conditions ==> missing npv_1 and end_month_bal then take sum of all npv_1/sum of all end_month_bal for just exposure_bin=100% condition regardless of fin_status.
Sorry for the wordy question. Any help and thoughts you could share is much appreciated! Thanks!
1. Here is my raw data. Notice the missing npv_1 and end_month_bal for a few observations:
2. The new variable, per_npv, that I want to create in this step outputs missing values for two of the rows above. I want to tell SAS that, in that case, it should take sum npv_1/end_month_bal for the total grouped by exposure_bin and ignore the fin_status condition. So for company id 135 (observed exposure_bin of 110-120%, I want sas to generate per_npv as by taking sum of all 110-120% npv_1 / sum of all 110-120% end_month_bal. (totally ignorning fin_status=5). As of this moment, the code I wrote above produces this incomplete SAS output:
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I guess you'd be looking for something like
Proc Sql;
create table Valuation2016 as
select
a.fin_status,
a.exposure_bin,
coalesce(per_npv_status, per_npv_all) as per_npv
from (
select unique
fin_status,
exposure_bin
from
companies_all as a)
left join (
select
fin_status,
exposure_bin,
sum(npv_1) / sum(end_month_bal) as per_npv_status
from companies_all as b
where
flag = 1 and
eff_loan is not missing
group by
fin_status,
exposure_bin ) on a.fin_status = b.fin_status and a.exposure_bin = b.exposure_bin
left join (
select
exposure_bin,
sum(npv_1) / sum(end_month_bal) as per_npv_all
from companies_all as c
where
flag = 1 and
eff_loan is not missing
group by
exposure_bin) on a.exposure_bin = c.exposure_bin
;
Quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It helps to provide some actual example input data and the desired output to demonstrate your issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thanks for much for your time and direction on this. I've included example input data and a description of desired output above. Sorry for the ambiguity earlier. Any suggestions you could provide will be much appreciated!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I guess you'd be looking for something like
Proc Sql;
create table Valuation2016 as
select
a.fin_status,
a.exposure_bin,
coalesce(per_npv_status, per_npv_all) as per_npv
from (
select unique
fin_status,
exposure_bin
from
companies_all as a)
left join (
select
fin_status,
exposure_bin,
sum(npv_1) / sum(end_month_bal) as per_npv_status
from companies_all as b
where
flag = 1 and
eff_loan is not missing
group by
fin_status,
exposure_bin ) on a.fin_status = b.fin_status and a.exposure_bin = b.exposure_bin
left join (
select
exposure_bin,
sum(npv_1) / sum(end_month_bal) as per_npv_all
from companies_all as c
where
flag = 1 and
eff_loan is not missing
group by
exposure_bin) on a.exposure_bin = c.exposure_bin
;
Quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I just corrected a typo. Does that help?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I follow that you did a left join twice, once with non-missing per_npvs and then that with missing per_npvs.
But could you briefly list out why you chose to do it this way. I was going down the route of using ifn() and case when to set new flag variables in another proc sql step to generate missing per_npvs followed by a left join but ran into so many errors. I'd like to really follow your thinking since it's so effective here. Thank you so much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As you noticed, three tables are joined. The first one contains all available combinations of fin_status and exposure_bin. The second one adds the summaries by fin_status-exposure_bin when available, and missing values when not available. The third table provides broader summaries by exposure_bin as a replacement for values missing in the second table. The replacement operation is done by the coalesce function.
Running each subquery separately might help you understand.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How would I break your Proc Sql step above into three separate proc sql tables, with the third proc sql table being the final join? I'm running into errors with the syntax and where to employ coalesce function if broken out.
Also, just curious why not use ifn instead of coalesce here?
Thank you very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Post the code you have tried so that we can help you further.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi PG,
This is the latest I can come up with. It runs without errors and I got the desired output to match exactly with the output from the three nested joins in the one proc sql step you helped. I think this clears it up! Unless there's an error I'm not noticing or an inefficiency somewhere you spot... additionally, I see now why you're using coalesce. The breakout into three queries makes me understand why, vs. having to use a ifn or case when, which would complicate the argument. Are there other reasons why ifn or case when should not be used to get my desired output. Thank you!
Proc Sql;
create table V2017 as select
fin_status,
exposure_bin,
sum(npv_1)/sum(end_month_bal) as per_npv_1
from mylib.companies_all (where=((modeled_flag=1) and (eff_loan ne .)))
group by fin_status,
exposure_bin
;
Quit;
Proc Sql;
create table V2018 as select
fin_status,
exposure_bin,
sum(npv_1)/sum(end_month_bal) as per_npv_0
from mylib.companies_all (where=((modeled_flag=1) and (eff_loan ne .)))
group by exposure_bin
order by fin_status
;
Quit;
Proc Sql;
create table V2019 as select
a.fin_status,
a.exposure_bin,
a.per_npv_1,
b.per_npv_0,
coalesce (a.per_npv_1, b.per_npv_0) as per_npv
from V2017 a left join V2018 b on
a.fin_status=b.fin_status and a.exposure_bin=b.exposure_bin
order by fin_status, exposure_bin
;
Quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I used coalesce because it's simpler, in this case. Simpler to code and simpler to read. I use case expressions when I need more flexibility. Somehow, I never get to use ifn().
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also, I think it is preferable to code your where conditions as SQL WHERE clauses and not as dataset options because you can mix references to many tables in a SQL WHERE clause and they are easier to read.