Hi , I need help in summing up leave_days monthly for each user, and if the user did not take for leave on a particular calender_month, 0 value should display from the leave_days column corresponding to the calender_month which they did not take leave
Note that if user did not take leave, there will be no record on the leave table and if for example user took 2 days leave on the same calender_month, then 2 separate records for the same calender_month with will appear in the leave table.
Data Have
data users;
input users $50.;
datalines;
Andrew Lup
Mike Mora
Farren Seb
;
data leave;
input calender_month leave_days username $50.;
datalines;
202101 1 Andrew Lup
202101 1 Andrew Lup
202102 1 Andrew Lup
202107 1 Andrew Lup
202101 1 Mike Mora
202101 1 Mike Mora
202101 1 Mike Mora
202107 1 Mike Mora
202107 1 Mike Mora
;
data want
calender_month username leave_days
202101 Andrew Lup 2
202102 Andrew Lup 1
202103 Andrew Lup 0
202104 Andrew Lup 0
202105 Andrew Lup 0
202106 Andrew Lup 0
202107 Andrew Lup 1
202101 Mike Mora 3
202102 Mike Mora 0
202103 Mike Mora 0
202104 Mike Mora 0
202105 Mike Mora 0
202106 Mike Mora 0
202107 Mike Mora 2
202101 Farren Seb 0
202102 Farren Seb 0
202103 Farren Seb 0
202104 Farren Seb 0
202105 Farren Seb 0
202106 Farren Seb 0
202107 Farren Seb 0
First of all, NEVER store a date value like this. NEVER.
Always store date and time related values as such in SAS (counts of days or seconds, starting at 1960-01-01, or midnight for pure times).
So your dataset leave has to look like this:
data leave;
input calender_month :yymmn6. leave_days username &:$50.;
format calender_month yymmn6.;
datalines;
202101 1 Andrew Lup
202101 1 Andrew Lup
202102 1 Andrew Lup
202107 1 Andrew Lup
202101 1 Mike Mora
202101 1 Mike Mora
202101 1 Mike Mora
202107 1 Mike Mora
202107 1 Mike Mora
;
Next, create a dataset that has all months for all users:
%let start=%sysfunc(inputn(20210101,yymmdd8.));
%let end=%sysfunc(inputn(20210731,yymmdd8.));
data all_months;
set users;
format calender_month yymmn6.;
calender_month = &start.;
do while (calender_month le &end.);
output;
calender_month = intnx('month',calender_month,1,"b");
end;
run;
Then, join that, and use COALESCE:
proc sql;
create table want as
select
t1.users as username,
t1.calender_month,
coalesce(sum(t2.leave_days),0) as leave_days
from all_months t1
left join leave t2
on t1.users = t2.username and t1.calender_month = t2.calender_month
group by t1.users, t1.calender_month
;
quit;
Edit: changed CASE to COALESCE
What have you tried?
Is leave_days = 1 for all obs in your real data?
Why is calender_month not a formatted sas-date?
Which months do expect to see in want? Are minimum and maximum taken from the data?
proc summary could be used to, all you need are two formats: one containing all users, the other containing all months you want to see in the result. Building the format for the names is easy:
data work.UserFmt;
set users(rename=(users = start));
length fmtname $ 32 label $ 50;
retain fmtname '$users';
label = start;
run;
proc format cntlin=work.UserFmt;
run;
For the month-dataset, i assume that calender_month has been fixed, so that it is a sas date:
data work.MonthFmt;
length
FmtName $ 32
start label 8
;
retain FmtName 'MonthFmt';
format start label yymmn6.; /* just for better readability */
start = '01Jan2021'd;
do while (start <= '01Jul2021'd);
label = Start;
output;
start = intnx('month', start, 1, 'b');
end;
run;
proc format cntlin=work.MonthFmt;
run;
Counting days without using leave_days
proc summary data=work.leave completetypes nway;
class username calender_month / preloadfmt;
format username $users. calender_month MonthFmt.;
output out=work.want(drop=_type_ rename=(_freq_=leave_days));
run;
proc datasets library= work nolist;
modify want;
format calender_month yymmn6.;
quit;
Well, a lot more code than the solution suggested by @Kurt_Bremser, so most likely just a show room for proc summary and formats 😉
First of all, NEVER store a date value like this. NEVER.
Always store date and time related values as such in SAS (counts of days or seconds, starting at 1960-01-01, or midnight for pure times).
So your dataset leave has to look like this:
data leave;
input calender_month :yymmn6. leave_days username &:$50.;
format calender_month yymmn6.;
datalines;
202101 1 Andrew Lup
202101 1 Andrew Lup
202102 1 Andrew Lup
202107 1 Andrew Lup
202101 1 Mike Mora
202101 1 Mike Mora
202101 1 Mike Mora
202107 1 Mike Mora
202107 1 Mike Mora
;
Next, create a dataset that has all months for all users:
%let start=%sysfunc(inputn(20210101,yymmdd8.));
%let end=%sysfunc(inputn(20210731,yymmdd8.));
data all_months;
set users;
format calender_month yymmn6.;
calender_month = &start.;
do while (calender_month le &end.);
output;
calender_month = intnx('month',calender_month,1,"b");
end;
run;
Then, join that, and use COALESCE:
proc sql;
create table want as
select
t1.users as username,
t1.calender_month,
coalesce(sum(t2.leave_days),0) as leave_days
from all_months t1
left join leave t2
on t1.users = t2.username and t1.calender_month = t2.calender_month
group by t1.users, t1.calender_month
;
quit;
Edit: changed CASE to COALESCE
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.