i have a dataset which has different months data i need to sum the same months data and than take the difference the previous month
city month count
NEV 01-Dec-2018 50
NEV 01-Dec-2018 40
REN 01-Dec-2018 20
REN 01-Dec-2018 80
NEV 01-jan-2019 15
NEV 01-jan-2019 15
REN 01-jan-2019 10
REN 01-jan-2019 10
NEV 01-feb-2019 25
NEV 01-feb-2019 25
REN 01-feb-2019 30
REN 01-feb-2019 20
NEV 01-mar-2019 45
NEV 01-mar-2019 45
REN 01-mar-2019 25
REN 01-mar-2019 25
output:
city month count diff
NEV 01-Dec-2018 90
REN 01-Dec-2018 100
NEV 01-jan-2019 30 -60
REN 01-jan-2019 20 -80
NEV 01-feb-2019 50 20
REN 01-feb-2019 50 30
NEV 01-mar-2019 90 40
REN 01-mar-2019 50 0
PROC SUMMARY will find the sum by month and city. Then you can take the output from PROC SUMMARY, sort it by city and month, then in a data step, use the LAG function to find the differences.
Hi @hk2013 A good excercise for me to personally practice and test myself 🙂 Thank you for the fun question
data have;
input city $ month :date9. count ;
format month date9.;
cards;
NEV 01-Dec-2018 50
NEV 01-Dec-2018 40
REN 01-Dec-2018 20
REN 01-Dec-2018 80
NEV 01-jan-2019 15
NEV 01-jan-2019 15
REN 01-jan-2019 10
REN 01-jan-2019 10
NEV 01-feb-2019 25
NEV 01-feb-2019 25
REN 01-feb-2019 30
REN 01-feb-2019 20
NEV 01-mar-2019 45
NEV 01-mar-2019 45
REN 01-mar-2019 25
REN 01-mar-2019 25
;
data want;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("city") ;
h.definedata ("_count") ;
h.definedone () ;
call missing(_count);
end;
do until(last.month);
set have;
by city month notsorted;
sum=sum(sum,count);
end;
if h.find()=0 then diff=sum-_count;
h.replace(key:city,data:sum);
keep city month sum diff;
run;
Are you programming or using the GUI in EG?
In SQL, that would be:
proc sql;
select
a.city,
a.month,
a.mCount,
a.mCount-b.mCount as countChange
from
(select city, month, sum(count) as mCount
from have group by city, month) as a left join
(select city, month, sum(count) as mCount
from have group by city, month) as b
on a.city=b.city and a.month=intnx("month", b.month, 1)
order by month, city;
quit;
This will give the correct answer even if there are some missing months.
If there was not gap in data.
data have;
input city $ month :date9. count ;
format month date9.;
cards;
NEV 01-Dec-2018 50
NEV 01-Dec-2018 40
REN 01-Dec-2018 20
REN 01-Dec-2018 80
NEV 01-jan-2019 15
NEV 01-jan-2019 15
REN 01-jan-2019 10
REN 01-jan-2019 10
NEV 01-feb-2019 25
NEV 01-feb-2019 25
REN 01-feb-2019 30
REN 01-feb-2019 20
NEV 01-mar-2019 45
NEV 01-mar-2019 45
REN 01-mar-2019 25
REN 01-mar-2019 25
;
proc summary data=have;
by city month notsorted;
var count;
output out=temp sum=;
run;
data want;
set temp(drop=_:);
dif=dif2(count);
run;
data have;
input city $ month :date9. count ;
format month DATE9.;
cards;
NEV 01-Dec-2018 50
NEV 01-Dec-2018 40
REN 01-Dec-2018 20
REN 01-Dec-2018 80
NEV 01-jan-2019 15
NEV 01-jan-2019 15
REN 01-jan-2019 10
REN 01-jan-2019 10
NEV 01-feb-2019 25
NEV 01-feb-2019 25
REN 01-feb-2019 30
REN 01-feb-2019 20
NEV 01-mar-2019 45
NEV 01-mar-2019 45
REN 01-mar-2019 25
REN 01-mar-2019 25
;
RUN;
PROC REPORT DATA=HAVE OUT=WANT22(DROP=_BREAK_);
COLUMN CITY MONTH COUNT MON YR;
DEFINE CITY/GROUP ;
DEFINE MONTH/GROUP;
DEFINE COUNT/SUM;
COMPUTE YR;
MON=MONTH(_C2_);
YR=YEAR(_C2_);
ENDCOMP;
RUN;
PROC SORT DATA=WANT22;
BY CITY YR MON;
RUN;
DATA WANT (KEEP=CITY MONTH COUNT DIFF);
SET WANT22;
BY CITY YR MON;
DIFF=DIF(COUNT);
IF FIRST.CITY THEN CALL MISSING(DIFF);
RUN;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.