Hello dear experts
I am kindly asking for your help, since I spent already a lot of time googling and did not find any solution yet.
I need to add a variable to my data set. The variable should be the average of value per month and the average of value per month previous year. Please see attached example data: there you can see the desired output variables 'average per month' and 'average per month prev.year')
With the following Code I get the needed average per month in a separate table, but I need it as additional variable in the existing data set.
PROC MEANS Data=mydata;
VAR value;
Class Date;
FORMAT Date Monyy.;
RUN;
Many thanks in advance for your kind support with this.
Best
SAS Newbie
Hi @Gretaku
No panic, we are almost there 🙂
It comes from the fact that you use the same name for all your tables:
Hi @Gretaku
Here is an attempt to achieve this:
data mydata;
input date:DDMMYY10. value;
datalines;
01.01.2019 0
02.01.2019 100
03.01.2019 -100
04.01.2019 100
05.01.2019 100
01.02.2019 100
02.02.2019 0
03.02.2019 0
04.02.2019 -100
05.02.2019 100
01.01.2020 -100
02.01.2020 0
03.01.2020 0
04.01.2020 -100
05.01.2020 100
;
run;
/* average of value per month */
proc sql;
create table mydata_temp as
select date format=monyy., value, avg(value) as monthly_avg
from mydata
group by year(date), month(date)
order by month(date), day(date), year(date);
run;
/* average of value per month previous year */
data want;
set mydata_temp;
monthly_avg_preyear = lag(monthly_avg);
if lag(date) > date or year(lag(date)) = year(date) then call missing(monthly_avg_preyear);
run;
proc sort data=want;
by date;
run;
@ed_sas_member Wow thank you so much for your prompt reply!
It worked for my dataset! However I have already come across another issue..
I need the average to be calculated for different groups. I tried to do it with "WHERE" statement, but I want the other data to remain in the data set as well [more like if ... then do avg(value) else .]?
PROC SQL;
CREATE TABLE mydata_temp AS
SELECT Touchpoint, Unit, Date format=monyy.,
avg(NPS_value) as monthly_avg_claimsnl
FROM mydata
WHERE Unit = 'N' AND Touchpoint = '1'
group by year(date), month(date)
order by month(date), day(date), year(date);
QUIT;
PROC SQL;
CREATE TABLE mydata_temp AS
SELECT Touchpoint, Unit, Date format=monyy.,
avg(NPS_value) as monthly_avg_claimsnl
FROM mydata
WHERE Unit = 'F' AND Touchpoint = '2'
group by year(date), month(date)
order by month(date), day(date), year(date);
QUIT;
attached I tried to show the preffered output.
Thank you so much!.
Best,
Hi @Gretaku
Sounds great!
Here is the revised code to meet the new requirement.
Please test it on you data and let me know.
I am just a bit confused about the values displayed in the Excel file for the monthly_avg variable: for example, I agree with -33.33 for the group (unit=N touchpoint=1) in 2020. However, it should be 25 instead of 100 in 2019?
Best,
data mydata;
input date:MMDDYY10. unit $ touchpoint value;
datalines;
1/1/2019 F 1 0
1/2/2019 F 1 100
1/3/2019 F 2 -100
1/4/2019 F 2 100
1/5/2019 N 2 100
1/6/2019 N 1 0
1/7/2019 N 1 100
1/8/2019 N 1 -100
1/9/2019 N 1 100
1/1/2020 F 1 -100
1/2/2020 F 1 0
1/3/2020 F 2 0
1/4/2020 F 2 -100
1/5/2020 F 1 100
1/1/2020 N 1 -100
1/2/2020 N 2 0
1/3/2020 N 2 0
1/4/2020 N 1 -100
1/5/2020 N 1 100
;
run;
/* average of value per month */
proc sql;
create table mydata_temp as
select date format=MMDDYY10.,
year(date) as year, month(date) as month,
unit, touchpoint,
avg(value) as monthly_avg
from mydata
group by unit, touchpoint, month(date), year(date);
run;
/* average of value per month previous year */
proc sql;
create table want as
select distinct a.*, b.monthly_avg as monthly_avg_preyear
from mydata_temp as a left join mydata_temp as b
on a.unit=b.unit and a.touchpoint=b.touchpoint
and a.month = b.month and (a.year-1)=b.year;
run;
Dear @ed_sas_member
The monthly avg part worked perfectly and the join as well, but I don't get any data for the variable monthly_avg_preyear...
Do you have an idea about what could be the problem?
Thank you in advance
Oh and about the calculation difference, you are right, that was a 'typo' from my end. Sorry about the confusion.
Best
Hi @Gretaku
Here is what I get when I run the code.
Could you please share a portion of your SAS log?
Are the sample data representative from your 'real' data?
Is there a confusion in the date format (e.g DDMMYY10. or MMDDYY10.) ?
Thank you in advance.
I did run it with my real data, which is more or less the same (except there are a lot more variables included). Here the log summary as a screenshot:
The programm runs through, but if I filter for ["monthly_avg_preyear" NOT .] there are no entrys.
The date format looks fine to me. "month" and "year" have been correctly allocated.
Here some example output:
To make things more easy I created a new variable "Touchpoint_spec" which already shows the groups build (in example F2, N1). I also created the variable preyear. Maybe we could write something like [IF Date = Date_preyear Put monthly_avg] ?
Thank you for your support.
Hi @Gretaku
It is a good idea to create a variable contatening unit and touchpoint -> it will simplify the code.
i am a bit concerned by the warning in the log, specifying that the date is already in the dataset. Does it still appear if you do that for the first step:
/* average of value per month */
proc sql;
create table mydata_temp as
select *,
year(date) as year,
month(date) as month,
avg(value) as monthly_avg
from mydata
group by touchpoint_spec, month(date), year(date);
run;
Excellent! The warning disappeared and I have all my variables in the file!
I seem to make a mistake, because I cannot find monthly_avg_preyear
Can you see what went wrong in the code?
PROC SQL;
CREATE TABLE TMP.TEST AS
select *,
year(date) as year,
month(date) as month,
avg(NPS_value) as monthly_avg
from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
group by Touchpoint_spec, month(date), year(date);
run;
/* average of value per month previous year */
proc sql;
create table TMP.TEST2 as
select distinct a.*, b.monthly_avg as monthly_avg_preyear
from TMP.TEST as a left join TMP.TEST as b
on a.Touchpoint_spec=b.Touchpoint_spec and a.Date=b.Date
and a.month = b.month and (a.year-1)=b.year;
run;
Hi @Gretaku
I am wondering if the issue come from the 'distinct' keyword.
Could you please try this ?
proc sql;
create table TMP.TEST as
select *,
year(date) as year,
month(date) as month,
avg(NPS_value) as monthly_avg
from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
group by touchpoint_spec, month(date), year(date);
run;
/* average of value per month previous year */
proc sql;
create table TMP.TEST_INT as
select distinct touchpoint_spec, year, month, monthly_avg
from TMP.TEST;
quit;
proc sql;
create table TMP.TEST2 as
select a.*, b.monthly_avg as monthly_avg_preyear
from TMP.TEST as a left join TMP.TEST_INT as b
on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and (a.year-1)=b.year
order by a.date, a.touchpoint_spec;
run;
Very weird indeed ..
Especially if it does not even create the "monthly_avg_preyear"
And nothing in the log?
Just to 'test' the Proc SQL left join, please remove the 'a.year-1' and put 'a.year':
proc sql;
create table TEST2 as
select a.*, b.monthly_avg as monthly_avg_preyear
from TEST as a left join TEST_INT as b
on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and a.year=b.year
order by a.date, a.touchpoint_spec;
run;
Ok. now i get the Error Message "ERROR: COLUMN DATE COULD NOT BE FOUND IN THE TABLE/VIEW IDENTIFIED WITH THE CORRELATION NAME A."
referring to : order by a.date, a.touchpoint_spec;
I removed a.Date from ORDER BY and it worked!! Even with the lost monthly_avg_preyear
Is there a possibility I could get all my other variables in there too?
proc sql;
create table TMP.SAS_Zum_Tableau_aktualisieren as
select *,
year(date) as year,
month(date) as month,
avg(NPS_value) as monthly_avg
from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
group by touchpoint_spec, month(date), year(date);
run;
/* average of value per month previous year */
proc sql;
create table TMP.SAS_Zum_Tableau_aktualisieren as
select distinct touchpoint_spec, year, month, monthly_avg
from TMP.SAS_Zum_Tableau_aktualisieren;
quit;
proc sql;
create table TMP.SAS_Zum_Tableau_aktualisieren as
select a.*, b.monthly_avg as monthly_avg_preyear
from TMP.SAS_Zum_Tableau_aktualisieren as a left join TMP.SAS_Zum_Tableau_aktualisieren as b
on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and (a.year-1)=b.year
order by a.touchpoint_spec;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.