BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Gretaku
Fluorite | Level 6

Hello dear experts

 

I am kindly asking for your help, since I spent already a lot of time googling and did not find any solution yet.

 

I need to add a variable to my data set. The variable should be the average of value per month and the average of value per month previous year. Please see attached example data: there you can see the desired output variables 'average per month' and 'average per month prev.year')

 

With the following Code I get the needed average per month in a separate table, but I need it as additional variable in the existing data set.


PROC MEANS Data=mydata;

VAR value;
Class Date;
FORMAT Date Monyy.;
RUN;

 

Many thanks in advance for your kind support with this.

 

Best

SAS Newbie

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

 

No panic, we are almost there 🙂

It comes from the fact that you use the same name for all your tables:

 

proc sql;
create table TMP.SAS_Zum_Tableau_aktualisierenBIS as
select *,
  year(date) as year,
  month(date) as month,
  avg(NPS_value) as monthly_avg
from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
group by touchpoint_spec, month(date), year(date);
run;
 
/* average of value per month previous year */
 
proc sql;
create table TMP.SAS_Zum_Tableau_aktualisierenTER as
select distinct touchpoint_spec, year, month, monthly_avg
from TMP.SAS_Zum_Tableau_aktualisierenBIS;
quit;
 
proc sql;
create table TMP.SAS_Zum_Tableau_aktualisierenWANT as
select a.*, b.monthly_avg as monthly_avg_preyear
from TMP.SAS_Zum_Tableau_aktualisierenBIS as a left join TMP.SAS_Zum_Tableau_aktualisierenTER as b
on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and (a.year-1)=b.year
order by a.touchpoint_spec;
run;

View solution in original post

17 REPLIES 17
Gretaku
Fluorite | Level 6

example data screenshot.jpg

ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

 

Here is an attempt to achieve this:

data mydata;
	input date:DDMMYY10. value;
	datalines;
01.01.2019 0
02.01.2019 100
03.01.2019 -100
04.01.2019 100
05.01.2019 100
01.02.2019 100
02.02.2019 0
03.02.2019 0
04.02.2019 -100
05.02.2019 100
01.01.2020 -100
02.01.2020 0
03.01.2020 0
04.01.2020 -100
05.01.2020 100
;
run;

/* average of value per month */
proc sql;
	create table mydata_temp as
	select date format=monyy., value, avg(value) as monthly_avg
	from mydata
	group by year(date), month(date)
	order by month(date), day(date), year(date);
run;

/* average of value per month previous year */

data want;
	set mydata_temp;
	monthly_avg_preyear = lag(monthly_avg);
	if lag(date) > date or year(lag(date)) = year(date) then call missing(monthly_avg_preyear);
run;
proc sort data=want;
	by date;
run;

Capture d’écran 2020-02-04 à 09.59.41.png

Gretaku
Fluorite | Level 6

@ed_sas_member Wow thank you so much for your prompt reply!

 

It worked for my dataset! However I have already come across another issue..

 

I need the average to be calculated for different groups. I tried to do it with "WHERE" statement, but I want the other data to remain in the data set as well [more like if ... then do avg(value) else .]?

 

 

PROC SQL;
   CREATE TABLE mydata_temp AS 
   SELECT Touchpoint, Unit, Date format=monyy.,
	      avg(NPS_value) as monthly_avg_claimsnl
   FROM mydata
   WHERE Unit = 'N' AND Touchpoint = '1'
      group by year(date), month(date)
	  order by month(date), day(date), year(date);
QUIT;

PROC SQL;
   CREATE TABLE mydata_temp AS 
   SELECT Touchpoint, Unit, Date format=monyy.,
	      avg(NPS_value) as monthly_avg_claimsnl
   FROM mydata
   WHERE Unit = 'F' AND Touchpoint = '2'
      group by year(date), month(date)
	  order by month(date), day(date), year(date);
QUIT;

 

attached I tried to show the preffered output.

 

Thank you so much!.

 

Best,

 


test data.jpg

ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

Sounds great!

Here is the revised code to meet the new requirement.

Please test it on you data and let me know.

I am just a bit confused about the values displayed in the Excel file for the monthly_avg variable: for example, I agree with -33.33 for the group (unit=N touchpoint=1) in 2020. However, it should be 25 instead of 100 in 2019?

 

Best,

 

data mydata;
	input date:MMDDYY10. unit $ touchpoint value;
	datalines;
1/1/2019 F 1 0
1/2/2019 F 1 100
1/3/2019 F 2 -100
1/4/2019 F 2 100
1/5/2019 N 2 100
1/6/2019 N 1 0
1/7/2019 N 1 100
1/8/2019 N 1 -100
1/9/2019 N 1 100
1/1/2020 F 1 -100
1/2/2020 F 1 0
1/3/2020 F 2 0
1/4/2020 F 2 -100
1/5/2020 F 1 100
1/1/2020 N 1 -100
1/2/2020 N 2 0
1/3/2020 N 2 0
1/4/2020 N 1 -100
1/5/2020 N 1 100
;
run;

/* average of value per month */

proc sql;
	create table mydata_temp as
	select date format=MMDDYY10.,
		   year(date) as year, month(date) as month,
		   unit, touchpoint,
		   avg(value) as monthly_avg
	from mydata
	group by unit, touchpoint, month(date), year(date);
run;

/* average of value per month previous year */

proc sql;
	create table want as
	select distinct a.*, b.monthly_avg as monthly_avg_preyear
	from mydata_temp as a left join mydata_temp as b
	on a.unit=b.unit and a.touchpoint=b.touchpoint
	   and a.month = b.month and (a.year-1)=b.year;
run;

 

Gretaku
Fluorite | Level 6

Dear @ed_sas_member 

 

The monthly avg part worked perfectly and the join as well, but I don't get any data for the variable monthly_avg_preyear...

 

Do you have an idea about what could be the problem?

 

Thank you in advance

 

Oh and about the calculation difference, you are right, that was a 'typo' from my end. Sorry about the confusion.

Best

 

ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

 

Here is what I get when I run the code.

Could you please share a portion of your SAS log?

Are the sample data representative from your 'real' data?

Is there a confusion in the date format (e.g DDMMYY10. or MMDDYY10.) ?

Thank you in advance.

Capture d’écran 2020-02-07 à 12.03.57.png

 

Gretaku
Fluorite | Level 6

Hi @ed_sas_member 

 

I did run it with my real data, which is more or less the same (except there are a lot more variables included). Here the log summary as a screenshot: Logsummary.jpg

 

The programm runs through, but if I filter for ["monthly_avg_preyear" NOT .] there are no entrys.

The date format looks fine to me. "month" and "year" have been correctly allocated.

 

Here some example output:

example output.jpg

 

To make things more easy I created a new variable "Touchpoint_spec" which already shows the groups build (in example F2, N1). I also created the variable preyear. Maybe we could write something like [IF Date = Date_preyear Put monthly_avg] ?

 

Thank you for your support.

ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

It is a good idea to create a variable contatening unit and touchpoint -> it will simplify the code.

i am a bit concerned by the warning in the log, specifying that the date is already in the dataset. Does it still appear if you do that for the first step:


/* average of value per month */

proc sql;
	create table mydata_temp as
	select *,
		   year(date) as year,
		   month(date) as month,
		   avg(value) as monthly_avg
	from mydata
	group by touchpoint_spec, month(date), year(date);
run;
Gretaku
Fluorite | Level 6

@ed_sas_member 

 

Excellent! The warning disappeared and I have all my variables in the file!

 

I seem to make a mistake, because I cannot find monthly_avg_preyear

 

no_preyear.jpg

 

Can you see what went wrong in the code?

PROC SQL;
   CREATE TABLE TMP.TEST AS 
	select *,
		   year(date) as year,
		   month(date) as month,
		   avg(NPS_value) as monthly_avg
	from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
	group by Touchpoint_spec, month(date), year(date);
run;

/* average of value per month previous year */
proc sql;
	create table TMP.TEST2 as
	select distinct a.*, b.monthly_avg as monthly_avg_preyear
	from TMP.TEST as a left join TMP.TEST as b
	on a.Touchpoint_spec=b.Touchpoint_spec and a.Date=b.Date
	   and a.month = b.month and (a.year-1)=b.year;
run;

 

ed_sas_member
Meteorite | Level 14

Hi @Gretaku 

I am wondering if the issue come from the 'distinct' keyword.

Could you please try this ?

proc sql;
	create table TMP.TEST as
	select *,
		   year(date) as year,
		   month(date) as month,
		   avg(NPS_value) as monthly_avg
	from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
	group by touchpoint_spec, month(date), year(date);
run;

/* average of value per month previous year */

proc sql;
	create table TMP.TEST_INT as
	select distinct touchpoint_spec, year, month, monthly_avg
	from TMP.TEST;
quit;

proc sql;
	create table TMP.TEST2 as
	select a.*, b.monthly_avg as monthly_avg_preyear
	from TMP.TEST as a left join TMP.TEST_INT as b
	on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and (a.year-1)=b.year
	order by a.date, a.touchpoint_spec;
run;
Gretaku
Fluorite | Level 6

@ed_sas_member 

 

Its bizarre. It does still not even create the variable "monthly_avg_preyear"..

 

 

ed_sas_member
Meteorite | Level 14

Very weird indeed ..

Especially if it does not even create the "monthly_avg_preyear"

And nothing in the log?

 

Just to 'test' the Proc SQL left join, please remove the 'a.year-1' and put 'a.year':

 

proc sql;
	create table TEST2 as
	select a.*, b.monthly_avg as monthly_avg_preyear
	from TEST as a left join TEST_INT as b
	on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and a.year=b.year
	order by a.date, a.touchpoint_spec;
run;
Gretaku
Fluorite | Level 6

Ok. now i get the Error Message "ERROR: COLUMN DATE COULD NOT BE FOUND IN THE TABLE/VIEW IDENTIFIED WITH THE CORRELATION NAME A."

 

referring to : order by a.date, a.touchpoint_spec;

Gretaku
Fluorite | Level 6

I removed a.Date from ORDER BY and it worked!! Even with the lost monthly_avg_preyear

 

worked.jpg

 

Is there a possibility I could get all my other variables in there too?

 

	proc sql;
	create table TMP.SAS_Zum_Tableau_aktualisieren as
	select *,
		   year(date) as year,
		   month(date) as month,
		   avg(NPS_value) as monthly_avg
	from TMP.SAS_ZUM_TABLEAU_AKTUALISIEREN
	group by touchpoint_spec, month(date), year(date);
run;

/* average of value per month previous year */

proc sql;
	create table TMP.SAS_Zum_Tableau_aktualisieren as
	select distinct touchpoint_spec, year, month, monthly_avg
	from TMP.SAS_Zum_Tableau_aktualisieren;
quit;

proc sql;
	create table TMP.SAS_Zum_Tableau_aktualisieren as
	select a.*, b.monthly_avg as monthly_avg_preyear
	from TMP.SAS_Zum_Tableau_aktualisieren as a left join TMP.SAS_Zum_Tableau_aktualisieren as b
	on a.touchpoint_spec=b.touchpoint_spec and a.month = b.month and (a.year-1)=b.year
	order by a.touchpoint_spec;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 3441 views
  • 3 likes
  • 2 in conversation