Hi -
Column 1 is date values; column 2 is number of instances; column 3 is person attributable to the instances on the date.
i need to count up monthly totals of instance by person, except that my "months" run from 19th to the 18th (like March 19, 2020 to April 18, 2020). I've got about 200,000 observations running back to January 2016.
I cannot figure out how to set month ranges since they are time periods not reflective of an actual calendar month.
I welcome any suggestions, because i don't even know where to begin.
12/7/2019 22 BILL
12/7/2019 1 AMY
11/9/2019 26 AMY
11/9/2019 3 JEFF
10/12/2019 30 AMY
10/12/2019 5 AMY
9/14/2019 16 JEFF
9/14/2019 6 BILL
8/17/2019 19 BILL
thanks,
Wendy
Or create a proper format that you reuse any time with any date variable.
data _F;
retain TYPE 'N' FMTNAME 'month_ends_19th_';
do Y=2010 to 2030;
do M= 1 to 12;
START=mdy(M,20,Y);
if M=12 then END=mdy(1,19,Y+1);
else END=mdy(M+1,19,Y);
LABEL=catt(put(START,monyy.),'-',put(END,monyy.));
output;
end;
end;
format START END date9.;
run;
proc format cntlin=_F;
run;
proc summary data =HAVE nway;
class PERSON MONTH;
var INSTANCES;
format MONTH month_ends_19th_.;
output out=SUM sum=;
run;
Create a new variable to indicate month, then run PROC SUMMARY
In this example, the new month variable will show APR20 for days March 19, 2020 to April 18, 2020. Naturally, you can pick other ways to display visually the month.
/* UNTESTED CODE */
data have2;
set have;
if day(date)>=19 then newmonth=intnx('month',date,1,'b');
else newmonth=date;
format newmonth monyy.;
run;
proc summary data=have2 nway;
class person newmonth;
var instances;
output out=want sum=;
run;
One way would be to create, then apply, a format. This would work for both proc means (if you only needed the printed result) and proc summary (if you wanted to create an output file):
data have;
input date mmddyy10. instances person $;
cards;
12/7/2019 22 BILL
12/7/2019 1 AMY
11/9/2019 26 AMY
11/9/2019 3 JEFF
10/12/2019 30 AMY
10/12/2019 5 AMY
9/14/2019 16 JEFF
9/14/2019 6 BILL
8/17/2019 19 BILL
;
run;
data fmtdata;
retain fmtname 'date_range' type 'N';
label=1;
do start='01jan2016'd to '31dec2020'd;
end=start;
if day(start) eq 19 then label+1;
output;
end;
run;
proc format cntlin = fmtdata;
run;
proc means data=have sum;
var instances;
class person date;
format date date_range.;
run;
Art, CEO, AnalystFinder.com
And another possibly obnoxious approach. Create a NEW variable that has the start of the month. Then use that variable in anything creating groups like Proc Means/ summary/ tabulate/ report.
Examine this data example and see if the Moddate aligns with your month start.
data example; do date= '01Jan2020'd to '31Jul2020'd; moddate = ifn(day(date)<19,intnx('month',date,-1,'b')+18,intnx('month',date,0,'b')+18); output; end; format date moddate date9.; run;
Use of a format like YYMON or YYMM on the moddate would indicate the start of the month.
It might help to show exactly what you expect the result to be for your example data so we aren't quite shooting fish in the forest.
Hi @wendy0327,
You can also create a dataset containing the definition of these custom intervals and then refer to this definition later on.
Adapting an example shown in the documentation:
/* Create dataset with user-defined date intervals "mymonth" */
options intervalds=(mymonth=mmds);
data mmds(keep=begin season);
start = '19DEC2015'd;
stop = '19DEC2050'd; /* use a later stop date, if needed */
nmonths = intck('month',start,stop);
do i=0 to nmonths;
begin = intnx('month',start,i,'s');
season = month(begin);
output;
end;
format begin date9.;
run;
/* Create summary statistics per person and interval */
proc sql;
create table want as
select person, intnx('mymonth',date,0) as startdt format=mmddyy10., sum(instances) as total_inst
from have
group by person, startdt;
quit;
Just use INTNX() function to move the date to start of the month in the middle of the range. Bascially you want to move by zero months when the day of the month is 18 or less and move to the next month when it is more than 18.
data have ;
input date count name $;
repmonth=intnx('month',date,day(date)>18);
informat date mmddyy.;
format date yymmdd10. repmonth yymm7.;
cards;
12/19/2019 22 BILL
12/7/2019 1 AMY
11/20/2019 26 AMY
11/9/2019 3 JEFF
10/22/2019 30 AMY
10/12/2019 5 AMY
9/14/2019 16 JEFF
9/14/2019 6 BILL
8/17/2019 19 BILL
;
proc print;
run;
proc means sum;
class repmonth;
var count;
run;
Results:
Obs date count name repmonth 1 2019-12-19 22 BILL 2020M01 2 2019-12-07 1 AMY 2019M12 3 2019-11-20 26 AMY 2019M12 4 2019-11-09 3 JEFF 2019M11 5 2019-10-22 30 AMY 2019M11 6 2019-10-12 5 AMY 2019M10 7 2019-09-14 16 JEFF 2019M09 8 2019-09-14 6 BILL 2019M09 9 2019-08-17 19 BILL 2019M08 Analysis Variable : count N repmonth Obs Sum ------------------------------- 2019M08 1 19.0000000 2019M09 2 22.0000000 2019M10 1 5.0000000 2019M11 2 33.0000000 2019M12 2 27.0000000 2020M01 1 22.0000000 -------------------------------
Hi @wendy0327
The idea here is to create a date value as the first day in any month, where the day value is > 18 in the preceding month. So april 17 is in april, but april 18 in may and so on, and do the summation on the calculated date and person. Note that the summation is done on a date value, even if it is formatted as yyyymm, and because of that it has to be the same day for all dates falling in the given summation-month.
data have;
format date date9.;
input date mmddyy10. number Person$;
cards;
12/7/2019 22 BILL
12/7/2019 1 AMY
11/9/2019 26 AMY
11/9/2019 3 JEFF
10/12/2019 30 AMY
10/12/2019 5 AMY
9/14/2019 16 JEFF
9/14/2019 6 BILL
8/17/2019 19 BILL
9/18/2019 16 JEFF
9/22/2019 6 BILL
8/23/2019 19 BILL
;
run;
proc sql;
create table tmp as
select distinct
Person,
mdy(month(date)+(day(date)>18),1,year(date)) as CountMonth format=yymmn6.,
sum(number) as SumNumber
from have
group by Person, CountMonth
order by Person, CountMonth;
quit;
Or create a proper format that you reuse any time with any date variable.
data _F;
retain TYPE 'N' FMTNAME 'month_ends_19th_';
do Y=2010 to 2030;
do M= 1 to 12;
START=mdy(M,20,Y);
if M=12 then END=mdy(1,19,Y+1);
else END=mdy(M+1,19,Y);
LABEL=catt(put(START,monyy.),'-',put(END,monyy.));
output;
end;
end;
format START END date9.;
run;
proc format cntlin=_F;
run;
proc summary data =HAVE nway;
class PERSON MONTH;
var INSTANCES;
format MONTH month_ends_19th_.;
output out=SUM sum=;
run;
@ChrisNZ great idea, I didn't think of that, but your formats are off by one day. The "month" should start on the 19th and end on the 18th.
@PaigeMiller oh no lol.
That will teach me to reply at midnight! 🙂
all these suggestions have been awesome. thank you all so much!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.