BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Gretaku
Fluorite | Level 6

Hello dear experts

 

I am kindly asking for your help, since I spent already a lot of time googling and did not find any solution yet.

 

I need to add a variable to my data set. The variable should be the average of value per month and the average of value per month previous year. Please see attached example data: there you can see the desired output variables 'average per month' and 'average per month prev.year')

 

With the following Code I get the needed average per month in a separate table, but I need it as additional variable in the existing data set.


PROC MEANS Data=mydata;

VAR value;
Class Date;
FORMAT Date Monyy.;
RUN;

 

Many thanks in advance for your kind support with this.

 

Best

SAS Newbie

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

 

No panic, we are almost there 🙂

It comes from the fact that you use the same name for all your tables:

 

proc sql;
create table TMP.SAS_Zum_Tableau_aktualisierenBIS as
select *,
  year(date) as year,
  month(date) as month,
  avg(NPS_value) as monthly_avg
from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
group by touchpoint_spec, month(date), year(date);
run;
 
/* average of value per month previous year */
 
proc sql;
create table TMP.SAS_Zum_Tableau_aktualisierenTER as
select distinct touchpoint_spec, year, month, monthly_avg
from TMP.SAS_Zum_Tableau_aktualisierenBIS;
quit;
 
proc sql;
create table TMP.SAS_Zum_Tableau_aktualisierenWANT as
select a.*, b.monthly_avg as monthly_avg_preyear
from TMP.SAS_Zum_Tableau_aktualisierenBIS as a left join TMP.SAS_Zum_Tableau_aktualisierenTER as b
on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and (a.year-1)=b.year
order by a.touchpoint_spec;
run;

View solution in original post

17 REPLIES 17
Gretaku
Fluorite | Level 6

example data screenshot.jpg

ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

 

Here is an attempt to achieve this:

data mydata;
	input date:DDMMYY10. value;
	datalines;
01.01.2019 0
02.01.2019 100
03.01.2019 -100
04.01.2019 100
05.01.2019 100
01.02.2019 100
02.02.2019 0
03.02.2019 0
04.02.2019 -100
05.02.2019 100
01.01.2020 -100
02.01.2020 0
03.01.2020 0
04.01.2020 -100
05.01.2020 100
;
run;

/* average of value per month */
proc sql;
	create table mydata_temp as
	select date format=monyy., value, avg(value) as monthly_avg
	from mydata
	group by year(date), month(date)
	order by month(date), day(date), year(date);
run;

/* average of value per month previous year */

data want;
	set mydata_temp;
	monthly_avg_preyear = lag(monthly_avg);
	if lag(date) > date or year(lag(date)) = year(date) then call missing(monthly_avg_preyear);
run;
proc sort data=want;
	by date;
run;

Capture d’écran 2020-02-04 à 09.59.41.png

Gretaku
Fluorite | Level 6

@ed_sas_member Wow thank you so much for your prompt reply!

 

It worked for my dataset! However I have already come across another issue..

 

I need the average to be calculated for different groups. I tried to do it with "WHERE" statement, but I want the other data to remain in the data set as well [more like if ... then do avg(value) else .]?

 

 

PROC SQL;
   CREATE TABLE mydata_temp AS 
   SELECT Touchpoint, Unit, Date format=monyy.,
	      avg(NPS_value) as monthly_avg_claimsnl
   FROM mydata
   WHERE Unit = 'N' AND Touchpoint = '1'
      group by year(date), month(date)
	  order by month(date), day(date), year(date);
QUIT;

PROC SQL;
   CREATE TABLE mydata_temp AS 
   SELECT Touchpoint, Unit, Date format=monyy.,
	      avg(NPS_value) as monthly_avg_claimsnl
   FROM mydata
   WHERE Unit = 'F' AND Touchpoint = '2'
      group by year(date), month(date)
	  order by month(date), day(date), year(date);
QUIT;

 

attached I tried to show the preffered output.

 

Thank you so much!.

 

Best,

 


test data.jpg

ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

Sounds great!

Here is the revised code to meet the new requirement.

Please test it on you data and let me know.

I am just a bit confused about the values displayed in the Excel file for the monthly_avg variable: for example, I agree with -33.33 for the group (unit=N touchpoint=1) in 2020. However, it should be 25 instead of 100 in 2019?

 

Best,

 

data mydata;
	input date:MMDDYY10. unit $ touchpoint value;
	datalines;
1/1/2019 F 1 0
1/2/2019 F 1 100
1/3/2019 F 2 -100
1/4/2019 F 2 100
1/5/2019 N 2 100
1/6/2019 N 1 0
1/7/2019 N 1 100
1/8/2019 N 1 -100
1/9/2019 N 1 100
1/1/2020 F 1 -100
1/2/2020 F 1 0
1/3/2020 F 2 0
1/4/2020 F 2 -100
1/5/2020 F 1 100
1/1/2020 N 1 -100
1/2/2020 N 2 0
1/3/2020 N 2 0
1/4/2020 N 1 -100
1/5/2020 N 1 100
;
run;

/* average of value per month */

proc sql;
	create table mydata_temp as
	select date format=MMDDYY10.,
		   year(date) as year, month(date) as month,
		   unit, touchpoint,
		   avg(value) as monthly_avg
	from mydata
	group by unit, touchpoint, month(date), year(date);
run;

/* average of value per month previous year */

proc sql;
	create table want as
	select distinct a.*, b.monthly_avg as monthly_avg_preyear
	from mydata_temp as a left join mydata_temp as b
	on a.unit=b.unit and a.touchpoint=b.touchpoint
	   and a.month = b.month and (a.year-1)=b.year;
run;

 

Gretaku
Fluorite | Level 6

Dear @ed_sas_member 

 

The monthly avg part worked perfectly and the join as well, but I don't get any data for the variable monthly_avg_preyear...

 

Do you have an idea about what could be the problem?

 

Thank you in advance

 

Oh and about the calculation difference, you are right, that was a 'typo' from my end. Sorry about the confusion.

Best

 

ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

 

Here is what I get when I run the code.

Could you please share a portion of your SAS log?

Are the sample data representative from your 'real' data?

Is there a confusion in the date format (e.g DDMMYY10. or MMDDYY10.) ?

Thank you in advance.

Capture d’écran 2020-02-07 à 12.03.57.png

 

Gretaku
Fluorite | Level 6

Hi @ed_sas_member 

 

I did run it with my real data, which is more or less the same (except there are a lot more variables included). Here the log summary as a screenshot: Logsummary.jpg

 

The programm runs through, but if I filter for ["monthly_avg_preyear" NOT .] there are no entrys.

The date format looks fine to me. "month" and "year" have been correctly allocated.

 

Here some example output:

example output.jpg

 

To make things more easy I created a new variable "Touchpoint_spec" which already shows the groups build (in example F2, N1). I also created the variable preyear. Maybe we could write something like [IF Date = Date_preyear Put monthly_avg] ?

 

Thank you for your support.

ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

It is a good idea to create a variable contatening unit and touchpoint -> it will simplify the code.

i am a bit concerned by the warning in the log, specifying that the date is already in the dataset. Does it still appear if you do that for the first step:


/* average of value per month */

proc sql;
	create table mydata_temp as
	select *,
		   year(date) as year,
		   month(date) as month,
		   avg(value) as monthly_avg
	from mydata
	group by touchpoint_spec, month(date), year(date);
run;
Gretaku
Fluorite | Level 6

@ed_sas_member 

 

Excellent! The warning disappeared and I have all my variables in the file!

 

I seem to make a mistake, because I cannot find monthly_avg_preyear

 

no_preyear.jpg

 

Can you see what went wrong in the code?

PROC SQL;
   CREATE TABLE TMP.TEST AS 
	select *,
		   year(date) as year,
		   month(date) as month,
		   avg(NPS_value) as monthly_avg
	from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
	group by Touchpoint_spec, month(date), year(date);
run;

/* average of value per month previous year */
proc sql;
	create table TMP.TEST2 as
	select distinct a.*, b.monthly_avg as monthly_avg_preyear
	from TMP.TEST as a left join TMP.TEST as b
	on a.Touchpoint_spec=b.Touchpoint_spec and a.Date=b.Date
	   and a.month = b.month and (a.year-1)=b.year;
run;

 

ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

I am wondering if the issue come from the 'distinct' keyword.

Could you please try this ?

proc sql;
	create table TMP.TEST as
	select *,
		   year(date) as year,
		   month(date) as month,
		   avg(NPS_value) as monthly_avg
	from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
	group by touchpoint_spec, month(date), year(date);
run;

/* average of value per month previous year */

proc sql;
	create table TMP.TEST_INT as
	select distinct touchpoint_spec, year, month, monthly_avg
	from TMP.TEST;
quit;

proc sql;
	create table TMP.TEST2 as
	select a.*, b.monthly_avg as monthly_avg_preyear
	from TMP.TEST as a left join TMP.TEST_INT as b
	on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and (a.year-1)=b.year
	order by a.date, a.touchpoint_spec;
run;
Gretaku
Fluorite | Level 6

@ed_sas_member 

 

Its bizarre. It does still not even create the variable "monthly_avg_preyear"..

 

 

ed_sas_member
Meteorite | Level 14

Very weird indeed ..

Especially if it does not even create the "monthly_avg_preyear"

And nothing in the log?

 

Just to 'test' the Proc SQL left join, please remove the 'a.year-1' and put 'a.year':

 

proc sql;
	create table TEST2 as
	select a.*, b.monthly_avg as monthly_avg_preyear
	from TEST as a left join TEST_INT as b
	on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and a.year=b.year
	order by a.date, a.touchpoint_spec;
run;
Gretaku
Fluorite | Level 6

Ok. now i get the Error Message "ERROR: COLUMN DATE COULD NOT BE FOUND IN THE TABLE/VIEW IDENTIFIED WITH THE CORRELATION NAME A."

 

referring to : order by a.date, a.touchpoint_spec;

Gretaku
Fluorite | Level 6

I removed a.Date from ORDER BY and it worked!! Even with the lost monthly_avg_preyear

 

worked.jpg

 

Is there a possibility I could get all my other variables in there too?

 

	proc sql;
	create table TMP.SAS_Zum_Tableau_aktualisieren as
	select *,
		   year(date) as year,
		   month(date) as month,
		   avg(NPS_value) as monthly_avg
	from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
	group by touchpoint_spec, month(date), year(date);
run;

/* average of value per month previous year */

proc sql;
	create table TMP.SAS_Zum_Tableau_aktualisieren as
	select distinct touchpoint_spec, year, month, monthly_avg
	from TMP.SAS_Zum_Tableau_aktualisieren;
quit;

proc sql;
	create table TMP.SAS_Zum_Tableau_aktualisieren as
	select a.*, b.monthly_avg as monthly_avg_preyear
	from TMP.SAS_Zum_Tableau_aktualisieren as a left join TMP.SAS_Zum_Tableau_aktualisieren as b
	on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and (a.year-1)=b.year
	order by a.touchpoint_spec;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 4376 views
  • 3 likes
  • 2 in conversation