Hi All,
I have written a Proc Sql statement that does exactly what I want it to do: generate a new variable in a summation statement grouped by two variables. Here is the code:
Proc Sql;
create table Valuation2016 as select
fin_status,
exposure_bin,
sum(npv_1)/sum(end_month_bal) as per_npv
from companies_all (where=((flag=1) and (eff_loan ne .)))
group by fin_status,
exposure_bin
;
Quit;
The only problem is that, some of the time, my dataset contains no values for intersection of npv_1 and end_month_bal grouped by fin_status and exposure bin. For example, next month I may have a dataset that has no observations when fin_status=2 and exposure_bin="100%".
Question: In this case what I want to do is tell SAS that if it encounters a blank when grouped by fin_status and exposure_bin, to move on to the the total npv_1/end_month_bal for that that exposure_bin group.
For example, if fin_status=1 and exposure_bin=100% conditions ==> missing npv_1 and end_month_bal then take sum of all npv_1/sum of all end_month_bal for just exposure_bin=100% condition regardless of fin_status.
Sorry for the wordy question. Any help and thoughts you could share is much appreciated! Thanks!
1. Here is my raw data. Notice the missing npv_1 and end_month_bal for a few observations:
2. The new variable, per_npv, that I want to create in this step outputs missing values for two of the rows above. I want to tell SAS that, in that case, it should take sum npv_1/end_month_bal for the total grouped by exposure_bin and ignore the fin_status condition. So for company id 135 (observed exposure_bin of 110-120%, I want sas to generate per_npv as by taking sum of all 110-120% npv_1 / sum of all 110-120% end_month_bal. (totally ignorning fin_status=5). As of this moment, the code I wrote above produces this incomplete SAS output:
I guess you'd be looking for something like
Proc Sql;
create table Valuation2016 as
select
a.fin_status,
a.exposure_bin,
coalesce(per_npv_status, per_npv_all) as per_npv
from (
select unique
fin_status,
exposure_bin
from
companies_all as a)
left join (
select
fin_status,
exposure_bin,
sum(npv_1) / sum(end_month_bal) as per_npv_status
from companies_all as b
where
flag = 1 and
eff_loan is not missing
group by
fin_status,
exposure_bin ) on a.fin_status = b.fin_status and a.exposure_bin = b.exposure_bin
left join (
select
exposure_bin,
sum(npv_1) / sum(end_month_bal) as per_npv_all
from companies_all as c
where
flag = 1 and
eff_loan is not missing
group by
exposure_bin) on a.exposure_bin = c.exposure_bin
;
Quit;
It helps to provide some actual example input data and the desired output to demonstrate your issues.
I guess you'd be looking for something like
Proc Sql;
create table Valuation2016 as
select
a.fin_status,
a.exposure_bin,
coalesce(per_npv_status, per_npv_all) as per_npv
from (
select unique
fin_status,
exposure_bin
from
companies_all as a)
left join (
select
fin_status,
exposure_bin,
sum(npv_1) / sum(end_month_bal) as per_npv_status
from companies_all as b
where
flag = 1 and
eff_loan is not missing
group by
fin_status,
exposure_bin ) on a.fin_status = b.fin_status and a.exposure_bin = b.exposure_bin
left join (
select
exposure_bin,
sum(npv_1) / sum(end_month_bal) as per_npv_all
from companies_all as c
where
flag = 1 and
eff_loan is not missing
group by
exposure_bin) on a.exposure_bin = c.exposure_bin
;
Quit;
I just corrected a typo. Does that help?
As you noticed, three tables are joined. The first one contains all available combinations of fin_status and exposure_bin. The second one adds the summaries by fin_status-exposure_bin when available, and missing values when not available. The third table provides broader summaries by exposure_bin as a replacement for values missing in the second table. The replacement operation is done by the coalesce function.
Running each subquery separately might help you understand.
How would I break your Proc Sql step above into three separate proc sql tables, with the third proc sql table being the final join? I'm running into errors with the syntax and where to employ coalesce function if broken out.
Also, just curious why not use ifn instead of coalesce here?
Thank you very much!
Post the code you have tried so that we can help you further.
Hi PG,
This is the latest I can come up with. It runs without errors and I got the desired output to match exactly with the output from the three nested joins in the one proc sql step you helped. I think this clears it up! Unless there's an error I'm not noticing or an inefficiency somewhere you spot... additionally, I see now why you're using coalesce. The breakout into three queries makes me understand why, vs. having to use a ifn or case when, which would complicate the argument. Are there other reasons why ifn or case when should not be used to get my desired output. Thank you!
Proc Sql;
create table V2017 as select
fin_status,
exposure_bin,
sum(npv_1)/sum(end_month_bal) as per_npv_1
from mylib.companies_all (where=((modeled_flag=1) and (eff_loan ne .)))
group by fin_status,
exposure_bin
;
Quit;
Proc Sql;
create table V2018 as select
fin_status,
exposure_bin,
sum(npv_1)/sum(end_month_bal) as per_npv_0
from mylib.companies_all (where=((modeled_flag=1) and (eff_loan ne .)))
group by exposure_bin
order by fin_status
;
Quit;
Proc Sql;
create table V2019 as select
a.fin_status,
a.exposure_bin,
a.per_npv_1,
b.per_npv_0,
coalesce (a.per_npv_1, b.per_npv_0) as per_npv
from V2017 a left join V2018 b on
a.fin_status=b.fin_status and a.exposure_bin=b.exposure_bin
order by fin_status, exposure_bin
;
Quit;
I used coalesce because it's simpler, in this case. Simpler to code and simpler to read. I use case expressions when I need more flexibility. Somehow, I never get to use ifn().
Also, I think it is preferable to code your where conditions as SQL WHERE clauses and not as dataset options because you can mix references to many tables in a SQL WHERE clause and they are easier to read.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.