BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
blakezen
Obsidian | Level 7

Hi All,

 

I have written a Proc Sql statement that does exactly what I want it to do: generate a new variable in a summation statement grouped by two variables. Here is the code:

Proc Sql;

  create table Valuation2016 as select

     fin_status,

     exposure_bin,

  sum(npv_1)/sum(end_month_bal) as per_npv

  from companies_all (where=((flag=1) and (eff_loan ne .)))

  group by fin_status,

                          exposure_bin

      ;

Quit;

 

 The only problem is that, some of the time, my dataset contains no values for intersection of npv_1 and end_month_bal grouped by fin_status and exposure bin. For example, next month I may have a dataset that has no observations when fin_status=2 and exposure_bin="100%".

 

Question: In this case what I want to do is tell SAS that if it encounters a blank when grouped by fin_status and exposure_bin, to move on to the the total npv_1/end_month_bal for that that exposure_bin group.

    For example, if fin_status=1 and exposure_bin=100% conditions ==> missing npv_1 and end_month_bal then take sum of all npv_1/sum of all end_month_bal for just exposure_bin=100% condition regardless of fin_status.

 

Sorry for the wordy question. Any help and thoughts you could share is much appreciated! Thanks!

 

 

1. Here is my raw data. Notice the missing npv_1 and end_month_bal for a few observations:

companies_all.PNG raw data

 

2. The new variable, per_npv, that I want to create in this step outputs missing values for two of the rows above. I want to tell SAS that, in that case, it should take sum npv_1/end_month_bal for the total grouped by exposure_bin and ignore the fin_status condition. So for company id 135 (observed exposure_bin of 110-120%, I want sas to generate per_npv as by taking sum of all 110-120% npv_1 / sum of all 110-120% end_month_bal. (totally ignorning fin_status=5). As of this moment, the code I wrote above produces this incomplete SAS output:

Capture.PNG

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I guess you'd be looking for something like

 

Proc Sql;
create table Valuation2016 as 
	select 		
		a.fin_status,
		a.exposure_bin,
		coalesce(per_npv_status, per_npv_all) as per_npv
from (
		select unique
			fin_status,
			exposure_bin
		from 
			companies_all as a)
left join (
	select
		fin_status,
		exposure_bin,
		sum(npv_1) / sum(end_month_bal) as per_npv_status
	from companies_all as b
	where 
		flag = 1 and 
		eff_loan is not missing
	group by 
		fin_status,
		exposure_bin ) on a.fin_status = b.fin_status and a.exposure_bin = b.exposure_bin
left join (
		select
			exposure_bin,
			sum(npv_1) / sum(end_month_bal) as per_npv_all
		from companies_all as c
		where 
			flag = 1 and 
			eff_loan is not missing
		group by 
			exposure_bin) on a.exposure_bin = c.exposure_bin
;
Quit;
PG

View solution in original post

17 REPLIES 17
ballardw
Super User

It helps to provide some actual example input data and the desired output to demonstrate your issues.

blakezen
Obsidian | Level 7
Hi ballardw,
thanks for much for your time and direction on this. I've included example input data and a description of desired output above. Sorry for the ambiguity earlier. Any suggestions you could provide will be much appreciated!
PGStats
Opal | Level 21

I guess you'd be looking for something like

 

Proc Sql;
create table Valuation2016 as 
	select 		
		a.fin_status,
		a.exposure_bin,
		coalesce(per_npv_status, per_npv_all) as per_npv
from (
		select unique
			fin_status,
			exposure_bin
		from 
			companies_all as a)
left join (
	select
		fin_status,
		exposure_bin,
		sum(npv_1) / sum(end_month_bal) as per_npv_status
	from companies_all as b
	where 
		flag = 1 and 
		eff_loan is not missing
	group by 
		fin_status,
		exposure_bin ) on a.fin_status = b.fin_status and a.exposure_bin = b.exposure_bin
left join (
		select
			exposure_bin,
			sum(npv_1) / sum(end_month_bal) as per_npv_all
		from companies_all as c
		where 
			flag = 1 and 
			eff_loan is not missing
		group by 
			exposure_bin) on a.exposure_bin = c.exposure_bin
;
Quit;
PG
blakezen
Obsidian | Level 7
Hi PG, thanks for your code I went through the lines and mostly understand your logic. I get an output that has too many rows.. I think the issue is that I am not getting the aggregate. I should not have more than 9 rows yet I have more than 50. Is there an issue of group by exposure here? Thanks very much.
PGStats
Opal | Level 21

I just corrected a typo. Does that help?

PG
blakezen
Obsidian | Level 7
Yes it does!! It's exactly what I was looking for. Thank you so much!!

I follow that you did a left join twice, once with non-missing per_npvs and then that with missing per_npvs.

But could you briefly list out why you chose to do it this way. I was going down the route of using ifn() and case when to set new flag variables in another proc sql step to generate missing per_npvs followed by a left join but ran into so many errors. I'd like to really follow your thinking since it's so effective here. Thank you so much!
PGStats
Opal | Level 21

As you noticed, three tables are joined. The first one contains all available combinations of fin_status and exposure_bin. The second one adds the summaries by fin_status-exposure_bin when available, and missing values when not available. The third table provides broader summaries by exposure_bin as a replacement for values missing in the second table. The replacement operation is done by the coalesce function.
Running each subquery separately might help you understand.

PG
blakezen
Obsidian | Level 7
This is most helpful, thank you!!
blakezen
Obsidian | Level 7

How would I break your Proc Sql step above into three separate proc sql tables, with the third proc sql table being the final join? I'm running into errors with the syntax and where to employ coalesce function if broken out.

 

Also, just curious why not use ifn instead of coalesce here?

 

Thank you very much!

PGStats
Opal | Level 21

Post the code you have tried so that we can help you further.

PG
blakezen
Obsidian | Level 7

Hi PG,

 

      This is the latest I can come up with. It runs without errors and I got the desired output to match exactly with the output from the three nested joins in the one proc sql step you helped. I think this clears it up! Unless there's an error I'm not noticing or an inefficiency somewhere you spot... additionally, I see now why you're using coalesce. The breakout into three queries makes me understand why, vs. having to use a ifn or case when, which would complicate the argument. Are there other reasons why ifn or case when should not be used to get my desired output. Thank you!

 

Proc Sql;

create table V2017 as select

fin_status,

exposure_bin,

sum(npv_1)/sum(end_month_bal) as per_npv_1

from mylib.companies_all (where=((modeled_flag=1) and (eff_loan ne .)))

group by fin_status,

exposure_bin

;

Quit;

Proc Sql;

create table V2018 as select

fin_status,

exposure_bin,

sum(npv_1)/sum(end_month_bal) as per_npv_0

 

from mylib.companies_all (where=((modeled_flag=1) and (eff_loan ne .)))

 

group by exposure_bin

order by fin_status

;

Quit;

Proc Sql;

create table V2019 as select

a.fin_status,

a.exposure_bin,

a.per_npv_1,

b.per_npv_0,

coalesce (a.per_npv_1, b.per_npv_0) as per_npv

 

from V2017 a left join V2018 b on

a.fin_status=b.fin_status and a.exposure_bin=b.exposure_bin

 

order by fin_status, exposure_bin

;

Quit;

PGStats
Opal | Level 21

I used coalesce because it's simpler, in this case. Simpler to code and simpler to read. I use case expressions when I need more flexibility. Somehow, I never get to use ifn().

PG
blakezen
Obsidian | Level 7
Thanks!
PGStats
Opal | Level 21

Also, I think it is preferable to code your where conditions as SQL WHERE clauses and not as dataset options because you can mix references to many tables in a SQL WHERE clause and they are easier to read.

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 9288 views
  • 5 likes
  • 3 in conversation