Hi , I need help in summing up leave_days monthly for each user, and if the user did not take for leave on a particular calender_month, 0 value should display from the leave_days column corresponding to the calender_month which they did not take leave
Note that if user did not take leave, there will be no record on the leave table and if for example user took 2 days leave on the same calender_month, then 2 separate records for the same calender_month with will appear in the leave table.
Data Have
data users;
input users $50.;
datalines;
Andrew Lup
Mike Mora
Farren Seb
;
data leave;
input calender_month leave_days username $50.;
datalines;
202101 1 Andrew Lup
202101 1 Andrew Lup
202102 1 Andrew Lup
202107 1 Andrew Lup
202101 1 Mike Mora
202101 1 Mike Mora
202101 1 Mike Mora
202107 1 Mike Mora
202107 1 Mike Mora
;
data want
calender_month username leave_days
202101 Andrew Lup 2
202102 Andrew Lup 1
202103 Andrew Lup 0
202104 Andrew Lup 0
202105 Andrew Lup 0
202106 Andrew Lup 0
202107 Andrew Lup 1
202101 Mike Mora 3
202102 Mike Mora 0
202103 Mike Mora 0
202104 Mike Mora 0
202105 Mike Mora 0
202106 Mike Mora 0
202107 Mike Mora 2
202101 Farren Seb 0
202102 Farren Seb 0
202103 Farren Seb 0
202104 Farren Seb 0
202105 Farren Seb 0
202106 Farren Seb 0
202107 Farren Seb 0
First of all, NEVER store a date value like this. NEVER.
Always store date and time related values as such in SAS (counts of days or seconds, starting at 1960-01-01, or midnight for pure times).
So your dataset leave has to look like this:
data leave;
input calender_month :yymmn6. leave_days username &:$50.;
format calender_month yymmn6.;
datalines;
202101 1 Andrew Lup
202101 1 Andrew Lup
202102 1 Andrew Lup
202107 1 Andrew Lup
202101 1 Mike Mora
202101 1 Mike Mora
202101 1 Mike Mora
202107 1 Mike Mora
202107 1 Mike Mora
;
Next, create a dataset that has all months for all users:
%let start=%sysfunc(inputn(20210101,yymmdd8.));
%let end=%sysfunc(inputn(20210731,yymmdd8.));
data all_months;
set users;
format calender_month yymmn6.;
calender_month = &start.;
do while (calender_month le &end.);
output;
calender_month = intnx('month',calender_month,1,"b");
end;
run;
Then, join that, and use COALESCE:
proc sql;
create table want as
select
t1.users as username,
t1.calender_month,
coalesce(sum(t2.leave_days),0) as leave_days
from all_months t1
left join leave t2
on t1.users = t2.username and t1.calender_month = t2.calender_month
group by t1.users, t1.calender_month
;
quit;
Edit: changed CASE to COALESCE
What have you tried?
Is leave_days = 1 for all obs in your real data?
Why is calender_month not a formatted sas-date?
Which months do expect to see in want? Are minimum and maximum taken from the data?
proc summary could be used to, all you need are two formats: one containing all users, the other containing all months you want to see in the result. Building the format for the names is easy:
data work.UserFmt;
set users(rename=(users = start));
length fmtname $ 32 label $ 50;
retain fmtname '$users';
label = start;
run;
proc format cntlin=work.UserFmt;
run;
For the month-dataset, i assume that calender_month has been fixed, so that it is a sas date:
data work.MonthFmt;
length
FmtName $ 32
start label 8
;
retain FmtName 'MonthFmt';
format start label yymmn6.; /* just for better readability */
start = '01Jan2021'd;
do while (start <= '01Jul2021'd);
label = Start;
output;
start = intnx('month', start, 1, 'b');
end;
run;
proc format cntlin=work.MonthFmt;
run;
Counting days without using leave_days
proc summary data=work.leave completetypes nway;
class username calender_month / preloadfmt;
format username $users. calender_month MonthFmt.;
output out=work.want(drop=_type_ rename=(_freq_=leave_days));
run;
proc datasets library= work nolist;
modify want;
format calender_month yymmn6.;
quit;
Well, a lot more code than the solution suggested by @Kurt_Bremser, so most likely just a show room for proc summary and formats 😉
First of all, NEVER store a date value like this. NEVER.
Always store date and time related values as such in SAS (counts of days or seconds, starting at 1960-01-01, or midnight for pure times).
So your dataset leave has to look like this:
data leave;
input calender_month :yymmn6. leave_days username &:$50.;
format calender_month yymmn6.;
datalines;
202101 1 Andrew Lup
202101 1 Andrew Lup
202102 1 Andrew Lup
202107 1 Andrew Lup
202101 1 Mike Mora
202101 1 Mike Mora
202101 1 Mike Mora
202107 1 Mike Mora
202107 1 Mike Mora
;
Next, create a dataset that has all months for all users:
%let start=%sysfunc(inputn(20210101,yymmdd8.));
%let end=%sysfunc(inputn(20210731,yymmdd8.));
data all_months;
set users;
format calender_month yymmn6.;
calender_month = &start.;
do while (calender_month le &end.);
output;
calender_month = intnx('month',calender_month,1,"b");
end;
run;
Then, join that, and use COALESCE:
proc sql;
create table want as
select
t1.users as username,
t1.calender_month,
coalesce(sum(t2.leave_days),0) as leave_days
from all_months t1
left join leave t2
on t1.users = t2.username and t1.calender_month = t2.calender_month
group by t1.users, t1.calender_month
;
quit;
Edit: changed CASE to COALESCE
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.