BookmarkSubscribeRSS Feed
everyone
Fluorite | Level 6

I'm summing a field within a join subquery to calculate a field's total and join it to a given table. However, I'm having difficulty handling null values.

 

I realize a null value plus anything will equal null, so I hoped to handle with coalesce within my sum function. This approach is not working for me, because not all tables contain all IDs. When such a table is joined, I end up with a null value. I'd like to handle these instances conditionally, so that a null total is equal to an employee's max total if this value is non-null or 0 if this value is null. 

 

I've included a reprex below. What adjustments should I make to end up with non-null totals for all records in my joined table? Thank you.

 

/*sample table a*/
data table_a;
input id title $ region $ calls;
cards;
1 manager south 30
1 agent north 20
2 manager west 20
2 agent south 25
;
run;

/*sample table b*/
data table_b;
input id title $ sales;
cards;
1 manager 20
2 manager 5
2 agent 3
;
run;

/*sample table c*/
data table_c;
input id title $ leads;
cards;
1 manager .
1 agent 10
;
run;

/*join tables 
(in this code, my sums return null values for my calculated totals for IDs where records don't exist in all tables.
e.g. total_sales are null for id = 1, title = agent
total_leads are null for id 2
I'd like to replace null values with an employee's max total (20 for employee 1's total sales) or 0 if max total is null (0 for employee 2's total leads)*/ proc sql; create table reprex as select a.id, a.region, a.calls, a.title, b.total_sales, b.sales, c.total_leads, c.leads from table_a as a left join (select sum(coalesce(sales, 0)) as total_sales, sales, id, title from table_b group by id) b on a.id = b.id and a.title = b.title left join (select sum(coalesce(leads, 0)) as total_leads, leads, id, title from table_c group by id) c on a.id = c.id and a.title = c.title; quit;

 

2 REPLIES 2
Tom
Super User Tom
Super User

If the goal is to replace the null (missing) values in the final result then move the COALESCE() function to that final list of variables.

For example if it is just the two variables whose names start with TOTAL_ that you are worried about then make the following change:

create table reprex as
  select a.id
      , a.region
      , a.calls
      , a.title
      , coalesce(b.total_sales,0) as total_sales
      , b.sales
      , coalesce(c.total_leads,0) as total_leads
      , c.leads
from ...

 

everyone
Fluorite | Level 6

Thanks, to handle nulls conditionally I think I should use a case statement? This way I can set a null total equal to zero if an employee has null values for all of their records within that field, or the max if one of their records has a non-null value. My instinct would be to partition by the max total, but I don't think that is supported by proc sql. 

 

I was thinking something like this, but I know that won't work. 

create table reprex as
select a.id,
          a.region,
          a.calls,
          a.title,
          case when total_sales is null then max(coalesce(b.total_sales,0)) over (partition by id) else b.total_sales
from ...
     

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1495 views
  • 0 likes
  • 2 in conversation