Hello dear experts
I am kindly asking for your help, since I spent already a lot of time googling and did not find any solution yet.
I need to add a variable to my data set. The variable should be the average of value per month and the average of value per month previous year. Please see attached example data: there you can see the desired output variables 'average per month' and 'average per month prev.year')
With the following Code I get the needed average per month in a separate table, but I need it as additional variable in the existing data set.
PROC MEANS Data=mydata;
VAR value;
Class Date;
FORMAT Date Monyy.;
RUN;
Many thanks in advance for your kind support with this.
Best
SAS Newbie
Hi @Gretaku
No panic, we are almost there 🙂
It comes from the fact that you use the same name for all your tables:
Hi @Gretaku
Here is an attempt to achieve this:
data mydata;
input date:DDMMYY10. value;
datalines;
01.01.2019 0
02.01.2019 100
03.01.2019 -100
04.01.2019 100
05.01.2019 100
01.02.2019 100
02.02.2019 0
03.02.2019 0
04.02.2019 -100
05.02.2019 100
01.01.2020 -100
02.01.2020 0
03.01.2020 0
04.01.2020 -100
05.01.2020 100
;
run;
/* average of value per month */
proc sql;
create table mydata_temp as
select date format=monyy., value, avg(value) as monthly_avg
from mydata
group by year(date), month(date)
order by month(date), day(date), year(date);
run;
/* average of value per month previous year */
data want;
set mydata_temp;
monthly_avg_preyear = lag(monthly_avg);
if lag(date) > date or year(lag(date)) = year(date) then call missing(monthly_avg_preyear);
run;
proc sort data=want;
by date;
run;
@ed_sas_member Wow thank you so much for your prompt reply!
It worked for my dataset! However I have already come across another issue..
I need the average to be calculated for different groups. I tried to do it with "WHERE" statement, but I want the other data to remain in the data set as well [more like if ... then do avg(value) else .]?
PROC SQL;
CREATE TABLE mydata_temp AS
SELECT Touchpoint, Unit, Date format=monyy.,
avg(NPS_value) as monthly_avg_claimsnl
FROM mydata
WHERE Unit = 'N' AND Touchpoint = '1'
group by year(date), month(date)
order by month(date), day(date), year(date);
QUIT;
PROC SQL;
CREATE TABLE mydata_temp AS
SELECT Touchpoint, Unit, Date format=monyy.,
avg(NPS_value) as monthly_avg_claimsnl
FROM mydata
WHERE Unit = 'F' AND Touchpoint = '2'
group by year(date), month(date)
order by month(date), day(date), year(date);
QUIT;
attached I tried to show the preffered output.
Thank you so much!.
Best,
Hi @Gretaku
Sounds great!
Here is the revised code to meet the new requirement.
Please test it on you data and let me know.
I am just a bit confused about the values displayed in the Excel file for the monthly_avg variable: for example, I agree with -33.33 for the group (unit=N touchpoint=1) in 2020. However, it should be 25 instead of 100 in 2019?
Best,
data mydata;
input date:MMDDYY10. unit $ touchpoint value;
datalines;
1/1/2019 F 1 0
1/2/2019 F 1 100
1/3/2019 F 2 -100
1/4/2019 F 2 100
1/5/2019 N 2 100
1/6/2019 N 1 0
1/7/2019 N 1 100
1/8/2019 N 1 -100
1/9/2019 N 1 100
1/1/2020 F 1 -100
1/2/2020 F 1 0
1/3/2020 F 2 0
1/4/2020 F 2 -100
1/5/2020 F 1 100
1/1/2020 N 1 -100
1/2/2020 N 2 0
1/3/2020 N 2 0
1/4/2020 N 1 -100
1/5/2020 N 1 100
;
run;
/* average of value per month */
proc sql;
create table mydata_temp as
select date format=MMDDYY10.,
year(date) as year, month(date) as month,
unit, touchpoint,
avg(value) as monthly_avg
from mydata
group by unit, touchpoint, month(date), year(date);
run;
/* average of value per month previous year */
proc sql;
create table want as
select distinct a.*, b.monthly_avg as monthly_avg_preyear
from mydata_temp as a left join mydata_temp as b
on a.unit=b.unit and a.touchpoint=b.touchpoint
and a.month = b.month and (a.year-1)=b.year;
run;
Dear @ed_sas_member
The monthly avg part worked perfectly and the join as well, but I don't get any data for the variable monthly_avg_preyear...
Do you have an idea about what could be the problem?
Thank you in advance
Oh and about the calculation difference, you are right, that was a 'typo' from my end. Sorry about the confusion.
Best
Hi @Gretaku
Here is what I get when I run the code.
Could you please share a portion of your SAS log?
Are the sample data representative from your 'real' data?
Is there a confusion in the date format (e.g DDMMYY10. or MMDDYY10.) ?
Thank you in advance.
I did run it with my real data, which is more or less the same (except there are a lot more variables included). Here the log summary as a screenshot:
The programm runs through, but if I filter for ["monthly_avg_preyear" NOT .] there are no entrys.
The date format looks fine to me. "month" and "year" have been correctly allocated.
Here some example output:
To make things more easy I created a new variable "Touchpoint_spec" which already shows the groups build (in example F2, N1). I also created the variable preyear. Maybe we could write something like [IF Date = Date_preyear Put monthly_avg] ?
Thank you for your support.
Hi @Gretaku
It is a good idea to create a variable contatening unit and touchpoint -> it will simplify the code.
i am a bit concerned by the warning in the log, specifying that the date is already in the dataset. Does it still appear if you do that for the first step:
/* average of value per month */
proc sql;
create table mydata_temp as
select *,
year(date) as year,
month(date) as month,
avg(value) as monthly_avg
from mydata
group by touchpoint_spec, month(date), year(date);
run;
Excellent! The warning disappeared and I have all my variables in the file!
I seem to make a mistake, because I cannot find monthly_avg_preyear
Can you see what went wrong in the code?
PROC SQL;
CREATE TABLE TMP.TEST AS
select *,
year(date) as year,
month(date) as month,
avg(NPS_value) as monthly_avg
from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
group by Touchpoint_spec, month(date), year(date);
run;
/* average of value per month previous year */
proc sql;
create table TMP.TEST2 as
select distinct a.*, b.monthly_avg as monthly_avg_preyear
from TMP.TEST as a left join TMP.TEST as b
on a.Touchpoint_spec=b.Touchpoint_spec and a.Date=b.Date
and a.month = b.month and (a.year-1)=b.year;
run;
Hi @Gretaku
I am wondering if the issue come from the 'distinct' keyword.
Could you please try this ?
proc sql;
create table TMP.TEST as
select *,
year(date) as year,
month(date) as month,
avg(NPS_value) as monthly_avg
from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
group by touchpoint_spec, month(date), year(date);
run;
/* average of value per month previous year */
proc sql;
create table TMP.TEST_INT as
select distinct touchpoint_spec, year, month, monthly_avg
from TMP.TEST;
quit;
proc sql;
create table TMP.TEST2 as
select a.*, b.monthly_avg as monthly_avg_preyear
from TMP.TEST as a left join TMP.TEST_INT as b
on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and (a.year-1)=b.year
order by a.date, a.touchpoint_spec;
run;
Very weird indeed ..
Especially if it does not even create the "monthly_avg_preyear"
And nothing in the log?
Just to 'test' the Proc SQL left join, please remove the 'a.year-1' and put 'a.year':
proc sql;
create table TEST2 as
select a.*, b.monthly_avg as monthly_avg_preyear
from TEST as a left join TEST_INT as b
on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and a.year=b.year
order by a.date, a.touchpoint_spec;
run;
Ok. now i get the Error Message "ERROR: COLUMN DATE COULD NOT BE FOUND IN THE TABLE/VIEW IDENTIFIED WITH THE CORRELATION NAME A."
referring to : order by a.date, a.touchpoint_spec;
I removed a.Date from ORDER BY and it worked!! Even with the lost monthly_avg_preyear
Is there a possibility I could get all my other variables in there too?
proc sql;
create table TMP.SAS_Zum_Tableau_aktualisieren as
select *,
year(date) as year,
month(date) as month,
avg(NPS_value) as monthly_avg
from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
group by touchpoint_spec, month(date), year(date);
run;
/* average of value per month previous year */
proc sql;
create table TMP.SAS_Zum_Tableau_aktualisieren as
select distinct touchpoint_spec, year, month, monthly_avg
from TMP.SAS_Zum_Tableau_aktualisieren;
quit;
proc sql;
create table TMP.SAS_Zum_Tableau_aktualisieren as
select a.*, b.monthly_avg as monthly_avg_preyear
from TMP.SAS_Zum_Tableau_aktualisieren as a left join TMP.SAS_Zum_Tableau_aktualisieren as b
on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and (a.year-1)=b.year
order by a.touchpoint_spec;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.