DATA Step, Macro, Functions and more

proc sql; group by; case statement

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

proc sql; group by; case statement

[ Edited ]

Hi All,

 

I have written a Proc Sql statement that does exactly what I want it to do: generate a new variable in a summation statement grouped by two variables. Here is the code:

Proc Sql;

  create table Valuation2016 as select

     fin_status,

     exposure_bin,

  sum(npv_1)/sum(end_month_bal) as per_npv

  from companies_all (where=((flag=1) and (eff_loan ne .)))

  group by fin_status,

                          exposure_bin

      ;

Quit;

 

 The only problem is that, some of the time, my dataset contains no values for intersection of npv_1 and end_month_bal grouped by fin_status and exposure bin. For example, next month I may have a dataset that has no observations when fin_status=2 and exposure_bin="100%".

 

Question: In this case what I want to do is tell SAS that if it encounters a blank when grouped by fin_status and exposure_bin, to move on to the the total npv_1/end_month_bal for that that exposure_bin group.

    For example, if fin_status=1 and exposure_bin=100% conditions ==> missing npv_1 and end_month_bal then take sum of all npv_1/sum of all end_month_bal for just exposure_bin=100% condition regardless of fin_status.

 

Sorry for the wordy question. Any help and thoughts you could share is much appreciated! Thanks!

 

 

1. Here is my raw data. Notice the missing npv_1 and end_month_bal for a few observations:

companies_all.PNG raw data

 

2. The new variable, per_npv, that I want to create in this step outputs missing values for two of the rows above. I want to tell SAS that, in that case, it should take sum npv_1/end_month_bal for the total grouped by exposure_bin and ignore the fin_status condition. So for company id 135 (observed exposure_bin of 110-120%, I want sas to generate per_npv as by taking sum of all 110-120% npv_1 / sum of all 110-120% end_month_bal. (totally ignorning fin_status=5). As of this moment, the code I wrote above produces this incomplete SAS output:

Capture.PNG

 

 


Accepted Solutions
Solution
‎06-01-2016 04:13 PM
Respected Advisor
Posts: 4,920

Re: proc sql; group by; case statement

[ Edited ]

I guess you'd be looking for something like

 

Proc Sql;
create table Valuation2016 as 
	select 		
		a.fin_status,
		a.exposure_bin,
		coalesce(per_npv_status, per_npv_all) as per_npv
from (
		select unique
			fin_status,
			exposure_bin
		from 
			companies_all as a)
left join (
	select
		fin_status,
		exposure_bin,
		sum(npv_1) / sum(end_month_bal) as per_npv_status
	from companies_all as b
	where 
		flag = 1 and 
		eff_loan is not missing
	group by 
		fin_status,
		exposure_bin ) on a.fin_status = b.fin_status and a.exposure_bin = b.exposure_bin
left join (
		select
			exposure_bin,
			sum(npv_1) / sum(end_month_bal) as per_npv_all
		from companies_all as c
		where 
			flag = 1 and 
			eff_loan is not missing
		group by 
			exposure_bin) on a.exposure_bin = c.exposure_bin
;
Quit;
PG

View solution in original post


All Replies
Super User
Posts: 11,343

Re: proc sql; group by; case statement

It helps to provide some actual example input data and the desired output to demonstrate your issues.

Contributor
Posts: 38

Re: proc sql; group by; case statement

Hi ballardw,
thanks for much for your time and direction on this. I've included example input data and a description of desired output above. Sorry for the ambiguity earlier. Any suggestions you could provide will be much appreciated!
Solution
‎06-01-2016 04:13 PM
Respected Advisor
Posts: 4,920

Re: proc sql; group by; case statement

[ Edited ]

I guess you'd be looking for something like

 

Proc Sql;
create table Valuation2016 as 
	select 		
		a.fin_status,
		a.exposure_bin,
		coalesce(per_npv_status, per_npv_all) as per_npv
from (
		select unique
			fin_status,
			exposure_bin
		from 
			companies_all as a)
left join (
	select
		fin_status,
		exposure_bin,
		sum(npv_1) / sum(end_month_bal) as per_npv_status
	from companies_all as b
	where 
		flag = 1 and 
		eff_loan is not missing
	group by 
		fin_status,
		exposure_bin ) on a.fin_status = b.fin_status and a.exposure_bin = b.exposure_bin
left join (
		select
			exposure_bin,
			sum(npv_1) / sum(end_month_bal) as per_npv_all
		from companies_all as c
		where 
			flag = 1 and 
			eff_loan is not missing
		group by 
			exposure_bin) on a.exposure_bin = c.exposure_bin
;
Quit;
PG
Contributor
Posts: 38

Re: proc sql; group by; case statement

Hi PG, thanks for your code I went through the lines and mostly understand your logic. I get an output that has too many rows.. I think the issue is that I am not getting the aggregate. I should not have more than 9 rows yet I have more than 50. Is there an issue of group by exposure here? Thanks very much.
Respected Advisor
Posts: 4,920

Re: proc sql; group by; case statement

I just corrected a typo. Does that help?

PG
Contributor
Posts: 38

Re: proc sql; group by; case statement

Yes it does!! It's exactly what I was looking for. Thank you so much!!

I follow that you did a left join twice, once with non-missing per_npvs and then that with missing per_npvs.

But could you briefly list out why you chose to do it this way. I was going down the route of using ifn() and case when to set new flag variables in another proc sql step to generate missing per_npvs followed by a left join but ran into so many errors. I'd like to really follow your thinking since it's so effective here. Thank you so much!
Respected Advisor
Posts: 4,920

Re: proc sql; group by; case statement

As you noticed, three tables are joined. The first one contains all available combinations of fin_status and exposure_bin. The second one adds the summaries by fin_status-exposure_bin when available, and missing values when not available. The third table provides broader summaries by exposure_bin as a replacement for values missing in the second table. The replacement operation is done by the coalesce function.
Running each subquery separately might help you understand.

PG
Contributor
Posts: 38

Re: proc sql; group by; case statement

This is most helpful, thank you!!
Contributor
Posts: 38

Re: proc sql; group by; case statement

How would I break your Proc Sql step above into three separate proc sql tables, with the third proc sql table being the final join? I'm running into errors with the syntax and where to employ coalesce function if broken out.

 

Also, just curious why not use ifn instead of coalesce here?

 

Thank you very much!

Respected Advisor
Posts: 4,920

Re: proc sql; group by; case statement

Post the code you have tried so that we can help you further.

PG
Contributor
Posts: 38

Re: proc sql; group by; case statement

Hi PG,

 

      This is the latest I can come up with. It runs without errors and I got the desired output to match exactly with the output from the three nested joins in the one proc sql step you helped. I think this clears it up! Unless there's an error I'm not noticing or an inefficiency somewhere you spot... additionally, I see now why you're using coalesce. The breakout into three queries makes me understand why, vs. having to use a ifn or case when, which would complicate the argument. Are there other reasons why ifn or case when should not be used to get my desired output. Thank you!

 

Proc Sql;

create table V2017 as select

fin_status,

exposure_bin,

sum(npv_1)/sum(end_month_bal) as per_npv_1

from mylib.companies_all (where=((modeled_flag=1) and (eff_loan ne .)))

group by fin_status,

exposure_bin

;

Quit;

Proc Sql;

create table V2018 as select

fin_status,

exposure_bin,

sum(npv_1)/sum(end_month_bal) as per_npv_0

 

from mylib.companies_all (where=((modeled_flag=1) and (eff_loan ne .)))

 

group by exposure_bin

order by fin_status

;

Quit;

Proc Sql;

create table V2019 as select

a.fin_status,

a.exposure_bin,

a.per_npv_1,

b.per_npv_0,

coalesce (a.per_npv_1, b.per_npv_0) as per_npv

 

from V2017 a left join V2018 b on

a.fin_status=b.fin_status and a.exposure_bin=b.exposure_bin

 

order by fin_status, exposure_bin

;

Quit;

Respected Advisor
Posts: 4,920

Re: proc sql; group by; case statement

I used coalesce because it's simpler, in this case. Simpler to code and simpler to read. I use case expressions when I need more flexibility. Somehow, I never get to use ifn().

PG
Contributor
Posts: 38

Re: proc sql; group by; case statement

Thanks!
Respected Advisor
Posts: 4,920

Re: proc sql; group by; case statement

Also, I think it is preferable to code your where conditions as SQL WHERE clauses and not as dataset options because you can mix references to many tables in a SQL WHERE clause and they are easier to read.

PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 1602 views
  • 5 likes
  • 3 in conversation