BookmarkSubscribeRSS Feed
Raza_M
Obsidian | Level 7

Hi everyone,

I want to plot weekly averages from 1999 to 2020 to see a trend over time. In my data, I have several municipalities and for each municipality I have weekly averages for all the years. Below is an example of the data. In the original data, there are several municipalities and years are until 2020.

data temp;
input country $ muncipality $ weeknumber $ year $ mean_temperature ;
datalines;
A X 1 2000 -8.0
A X 2 2000 -12.03
A X 3 2000 -1.40
A X 4 2000 -20.13
A X 5 2000 -11.07
A X 6 2000 -15.20
A X 7 2000 -5.07
A X 8 2000 -5.6
A X 9 2000 -5.8
A X 10 2000 -6.0
A X 11 2000 -1.48
A X 12 2000 -0.44
A X 13 2000 1.74
A X 14 2000 2.9
A X 15 2000 3.52
A X 16 2000 7.3
A X 17 2000 3.97
A X 18 2000 1.88
A X 19 2000 3.66
A X 20 2000 11.05
A X 21 2000 9.25
A X 22 2000 12.34
A X 23 2000 16.78
A X 24 2000 18.52
A X 25 2000 17.95
A X 26 2000 17.82
A X 27 2000 15.82
A X 28 2000 19.44
A X 29 2000 16.3
A X 30 2000 13.75
A X 31 2000 14.75
A X 32 2000 12.18
A X 33 2000 10.12
A X 34 2000 11.84
A X 35 2000 14.96
A X 36 2000 12.48
A X 37 2000 7.34
A X 38 2000 7.5
A X 39 2000 9.50
A X 40 2000 7.75
A X 41 2000 3.84
A X 42 2000 1.42
A X 43 2000 4.29
A X 44 2000 5.05
A X 45 2000 2.52
A X 46 2000 -4.7
A X 47 2000 0.14
A X 48 2000 -2.48
A X 49 2000 -7.2
A X 50 2000 -5.85
A X 51 2000 -2.89
A X 52 2000 -5.55
A Y 1 2000 -5.67
A Y 2 2000 -7.54
A Y 3 2000 0.71
A Y 4 2000 -16.74
A Y 5 2000 -8.78
A Y 6 2000 -12.80
A Y 7 2000 -5.3
A Y 8 2000 -4.09
A Y 9 2000 -5.50
A Y 10 2000 -1.10
A Y 11 2000 0.65
A Y 12 2000 2.3
A Y 13 2000 3.5
A Y 14 2000 4.45
A Y 15 2000 8.32
A Y 16 2000 5.68
A Y 17 2000 4.45
A Y 18 2000 8.32
A Y 19 2000 5.48
A Y 20 2000 3.89
A Y 21 2000 5.17
A Y 22 2000 3.76
A Y 23 2000 12.3
A Y 24 2000 14.05
A Y 25 2000 17.20
A Y 26 2000 18.20
A Y 27 2000 16.82
A Y 28 2000 20.45
A Y 29 2000 17.77
A Y 30 2000 15.68
A Y 31 2000 16.05
A Y 32 2000 10.65
A Y 33 2000 11.65
A Y 34 2000 10.47
A Y 35 2000 12.20
A Y 36 2000 13.20
A Y 37 2000 8.32
A Y 38 2000 8.90
A Y 39 2000 11.82
A Y 40 2000 7.45
A Y 41 2000 7.77
A Y 42 2000 2.62
A Y 43 2000 5.89
A Y 44 2000 7.77
A Y 45 2000 5.62
A Y 46 2000 -4.45
A Y 47 2000 -3.78
A Y 48 2000 -1.90
A Y 49 2000 -2.78
A Y 50 2000 -4.09
A Y 51 2000 -2.43
A Y 52 2000 -6.78
;

 

I want to make a plot showing me trend in temperature from 2000 to 2020. Below is an example I want. I removed the labls since it is from original data. y axis are values and x-axis are week numbers and years. This was made in excel.

Raza_M_0-1667310390854.png

However, when i write the following code it give me a graph of values averaged over weeks from all the years.

 

proc sgplot data=exposure; where country='A';
series x=weeknumber y=mean_temp;
run;

 

I get the following output

Raza_M_2-1667310781721.png

 

I need to understand where am I wrong. Your help is highly appreciated.

 

16 REPLIES 16
PaigeMiller
Diamond | Level 26

If you have data that goes from 2000 to 2020, and each has weeks 1, 2, 3, ... , 52, then your data set has 21 different values for week1 and your plot shows 21 different values for week 1. You did not instruct PROC SGPLOT to use YEAR in the creation of the plot.

 

Instead, you really ought to work with actual SAS date values, which are numeric values indicating the number of days since 01JAN1960. This is a common mistake beginners make, they do not use SAS date values. If you do use SAS date values, then the plot will work, as both year and week are encoded into the SAS date value.

 

data two;
    set one;
    year_week=mdy(1,1,year)+(week-1)*7;
    format year_week year4.;
run;
proc sgplot data=exposure; where country='A';
    series x=year_week y=mean_temp;
run;

 

 

The MDY function determines the first day of the year and then we add 7 days times the number of weeks to the first day of the year. So by using actual SAS date values (number of days since 01JAN1960), we now have days representing weeks from 01JAN2000 to the last week of 2020.

--
Paige Miller
Raza_M
Obsidian | Level 7
Thank you for replying and point out the mistake!
However, when i write your syntax I get error and created variables have all missing values.

687 data exposure;
688 set exposure;
689 year_week=mdy(1,1,year)+(week-1)*7;
690 format year_week year4.;
691 run;

NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
652292 at 689:34
NOTE: There were 652292 observations read from the data set WORK.EXPOSURE.
NOTE: The data set WORK.EXPOSURE has 652292 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 0.14 seconds
cpu time 0.15 seconds
PaigeMiller
Diamond | Level 26

Use the variable name WEEKNUMBER (that's what it is called in your data set) instead of WEEK. Also, when the data set is created, WEEKNUMBER and YEAR both should be numeric instead of character.

--
Paige Miller
Ms_Raza
Calcite | Level 5

Ms_Raza_1-1667470550827.png

@PaigeMiller I got this output after using your syntax. it seems like the syntax has put yearly graphs in one plot

 

PaigeMiller
Diamond | Level 26

Since you have provided only one year of data, I can't test it on your real actual data. However, you can show me the ENTIRE log (all of it, every single character, no omissions) for the code that you ran. Please copy the log as text and paste it into the window that appears when you click on the </> icon.

PaigeMiller_0-1663012019648.png

 

 

Something is wrong here, as your data has negative values, but your plot does not have negative values.

 

Also, from now on, please don't provide plots where the axes are not labelled (as you did in your original post)

--
Paige Miller
Ms_Raza
Calcite | Level 5
 data exposure;
5    set lib.weekly_avg_1999_2018;
6    run;

NOTE: There were 652292 observations read from the data set LIB.WEEKLY_AVG_1999_2018.
NOTE: The data set WORK.EXPOSURE has 652292 observations and 11 variables.
NOTE: DATA statement used (Total process time):
      real time           19.44 seconds
      cpu time            0.43 seconds


7
8    data exposure;
9     set exposure;
10       year_week=mdy(1,1,year)+(weeknumber-1)*7;
11       format year_week year4.;
12   run;

NOTE: There were 652292 observations read from the data set WORK.EXPOSURE.
NOTE: The data set WORK.EXPOSURE has 652292 observations and 12 variables.
NOTE: DATA statement used (Total process time):
      real time           0.13 seconds
      cpu time            0.14 seconds


13   ods graphics on / obsmax=302200;
14   proc sgplot data=exposure; where country='A';
15       series x=year_week y= mean_temp;
16   run;

NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           1.38 seconds
      cpu time            0.37 seconds

NOTE: Marker and line antialiasing has been disabled for at least one plot because the threshold has
      been reached. You can set ANTIALIASMAX=302200 in the ODS GRAPHICS statement to enable
      antialiasing for all plots.
NOTE: There were 302180 observations read from the data set WORK.EXPOSURE.
      WHERE country='A';

Ms_Raza_0-1667472966710.png

previous graph was on different exposure using the same syntax. This one is now on mean temperature.

PaigeMiller
Diamond | Level 26
NOTE: There were 302180 observations read from the data set WORK.EXPOSURE.
      WHERE country='A';

If there are 21 years, and 52 weeks in a year, then I would expect around 1092 records for a country. But obviously, you have a whole lot more records, and that is causing the plot to appear the way it does, looking like there are many lines going left to right when you were expecting one line going across the plot left to right. 

 

So you need to investigate why country='A' has 302180 records instead of around 1092, which is what I would expect.

--
Paige Miller
Ms_Raza
Calcite | Level 5

That is because country has municipalities, and each municipality has data for each year. So say for example if the country has 300 municipalities, then 300 multiply by 52 or 53 weeks multiply by total number of years=total number of row for that country. 

 

PaigeMiller
Diamond | Level 26

So do you want to average across municipalities to obtain 21 yrs times 52 weeks? Or do you want a plot for each municipality? Or something else?

--
Paige Miller
Ms_Raza
Calcite | Level 5

OK, now I understand the problem. I have around 300 municipalities in my dataset and I was trying to graph data for all the municipalities with so many repeated values for weeks of the same years. Now when I graphed the values for one municipality using your syntax I got the correct graph.

Ms_Raza_0-1667479558724.png

So, to produce the same for the country, values should be averaged for all the municipalities and then I get country level data to graph.

Thanks a lot for your input, it directed me to the right thinking. It is so basic; I don't understand why I missed it. 

Thank you!

 

PaigeMiller
Diamond | Level 26

@Ms_Raza wrote:

OK, now I understand the problem. I have around 300 municipalities in my dataset and I was trying to graph data for all the municipalities with so many repeated values for weeks of the same years. Now when I graphed the values for one municipality using your syntax I got the correct graph.

Ms_Raza_0-1667479558724.png

So, to produce the same for the country, values should be averaged for all the municipalities and then I get country level data to graph.

Thanks a lot for your input, it directed me to the right thinking. It is so basic; I don't understand why I missed it. 

Thank you!

 


Glad that it helped. One minor issue ... you said: "So, to produce the same for the country, values should be averaged for all the municipalities" ... I would say you can (not "should") work from averages, but you can also work from medians, or 75-percentile, or any other statistic that makes sense.

 

Also, I would change the labels on the axes and add a title which municipality this is (doesn't have to go in the title, but that seems to be the best place for it).

 

proc sgplot data=exposure; where country='A';
    series x=year_week y=mean_temp;
    title "Municipality: Boston";
    xaxis label="Year";
    yaxis label="Mean Temperature °C";
run;

 

You can also use the BY statement to generate all plots for all municipalities and all countries if you want (but if you have 300 municipalities in one country, that may be too much to look at).

--
Paige Miller
Ms_Raza
Calcite | Level 5

Great, I will add correct labels and titles to the graph.

Thank you!

ballardw
Super User

Reason: You did not provide any Year related information to graph. Second your data has a variable, muncipality (really is that the spelling you used?), that means that you have duplicate week/year combinations and nothing in the graph to handle them. So if your X axis is used correctly the data points will connect muncipality values for the same date.

 

Possible even worse, your Year and Weeknumber variables are CHARACTER if your data step is correct. Which means sort order is extremely problematic on the X axis as week 11 comes before week 2 in character values.

 

Here is a modification of your data step that creates an actual date value to use. This step would likely have been easier with data before the week summary as the Informats for  reading weeks are not very flexible. I also modify the SGplot code to handle "muncipality" date repetition.

data temp;
input country $ muncipality $ weeknumber $ year $ mean_temperature ;
date = input(catx('W',substr(year,3),put(input(weeknumber,2.),z2.)),Weeku5.);
format date date9.;
datalines;
A X 1 2000 -8.0
A X 2 2000 -12.03
A X 3 2000 -1.40
A X 4 2000 -20.13
A X 5 2000 -11.07
A X 6 2000 -15.20
A X 7 2000 -5.07
A X 8 2000 -5.6
A X 9 2000 -5.8
A X 10 2000 -6.0
A X 11 2000 -1.48
A X 12 2000 -0.44
A X 13 2000 1.74
A X 14 2000 2.9
A X 15 2000 3.52
A X 16 2000 7.3
A X 17 2000 3.97
A X 18 2000 1.88
A X 19 2000 3.66
A X 20 2000 11.05
A X 21 2000 9.25
A X 22 2000 12.34
A X 23 2000 16.78
A X 24 2000 18.52
A X 25 2000 17.95
A X 26 2000 17.82
A X 27 2000 15.82
A X 28 2000 19.44
A X 29 2000 16.3
A X 30 2000 13.75
A X 31 2000 14.75
A X 32 2000 12.18
A X 33 2000 10.12
A X 34 2000 11.84
A X 35 2000 14.96
A X 36 2000 12.48
A X 37 2000 7.34
A X 38 2000 7.5
A X 39 2000 9.50
A X 40 2000 7.75
A X 41 2000 3.84
A X 42 2000 1.42
A X 43 2000 4.29
A X 44 2000 5.05
A X 45 2000 2.52
A X 46 2000 -4.7
A X 47 2000 0.14
A X 48 2000 -2.48
A X 49 2000 -7.2
A X 50 2000 -5.85
A X 51 2000 -2.89
A X 52 2000 -5.55
A Y 1 2000 -5.67
A Y 2 2000 -7.54
A Y 3 2000 0.71
A Y 4 2000 -16.74
A Y 5 2000 -8.78
A Y 6 2000 -12.80
A Y 7 2000 -5.3
A Y 8 2000 -4.09
A Y 9 2000 -5.50
A Y 10 2000 -1.10
A Y 11 2000 0.65
A Y 12 2000 2.3
A Y 13 2000 3.5
A Y 14 2000 4.45
A Y 15 2000 8.32
A Y 16 2000 5.68
A Y 17 2000 4.45
A Y 18 2000 8.32
A Y 19 2000 5.48
A Y 20 2000 3.89
A Y 21 2000 5.17
A Y 22 2000 3.76
A Y 23 2000 12.3
A Y 24 2000 14.05
A Y 25 2000 17.20
A Y 26 2000 18.20
A Y 27 2000 16.82
A Y 28 2000 20.45
A Y 29 2000 17.77
A Y 30 2000 15.68
A Y 31 2000 16.05
A Y 32 2000 10.65
A Y 33 2000 11.65
A Y 34 2000 10.47
A Y 35 2000 12.20
A Y 36 2000 13.20
A Y 37 2000 8.32
A Y 38 2000 8.90
A Y 39 2000 11.82
A Y 40 2000 7.45
A Y 41 2000 7.77
A Y 42 2000 2.62
A Y 43 2000 5.89
A Y 44 2000 7.77
A Y 45 2000 5.62
A Y 46 2000 -4.45
A Y 47 2000 -3.78
A Y 48 2000 -1.90
A Y 49 2000 -2.78
A Y 50 2000 -4.09
A Y 51 2000 -2.43
A Y 52 2000 -6.78
;

proc sgplot data=temp; 
   where country='A';
   series x=date y=mean_temperature /group=muncipality;
   format date yymon.;
run;

 

One of the reasons to use date values is that you can do a lot with those that character "week" or "year" will not without lots of work. Once you have a date variable you can use formats to allow getting yearly, monthly, quarterly and/or weekly summary values just by changing a format in a procedure like Proc Means, or counts from Proc Freq. The groups created by formats will be honored by reporting procedures like Report or Tabulate, analysis procedures and for many graphing tasks.

 

SAS wants a single x axis variable. For date related values it is way better to actually use a SAS Date value so sort order is maintained.

 

It is poor practice to provide example data with different variable names than the procedures using it. Your data step has Mean_temperature, your Sgplot code uses Mean_temp

 

Ms_Raza
Calcite | Level 5

@ballardw Thank a lot for replying and taking time to correct the syntax and for your suggestion!

Using date is definitely more useable than using weeks or year but unfortunately, I have need weekly averages for my exposure window. I have date in my original datafile. I agree with you on following proper practice to provide examples. Regarding the spelling of municipality. I don't why was it not obvious that it was a "typo".

When I ran your syntax on my data, using numerical date variable I got an error

 


ERROR: An exception has been encountered.
Please contact technical support and provide them with the following traceback information:

The SAS task name is [SGPLOT]
ERROR: Read Access Violation SGPLOT
Exception occurred at (AA93560E)
Task Traceback
Address Frame (DBGHELP API Version 4.0 rev 5)
00007FFFAA93560E 00000074D0DF74C0 sasstgr:tkvercn1+0x845CE
00007FFFAA9554C7 00000074D0DF7500 sasstgr:tkvercn1+0xA4487
00007FFFAA93F737 00000074D0DF7570 sasstgr:tkvercn1+0x8E6F7
00007FFFAFEE4467 00000074D0DF7578 sasods:tkvercn1+0x273427
00007FFFAFCC5198 00000074D0DF7840 sasods:tkvercn1+0x54158
00007FFFAFCC0F6E 00000074D0DF79D0 sasods:tkvercn1+0x4FF2E
00007FFFAB0C1A97 00000074D0DF79D8 sassgplo:tkvercn1+0xC0A57
00007FFFAB03B092 00000074D0DFF4E0 sassgplo:tkvercn1+0x3A052
00007FFFDB4AF5A8 00000074D0DFF4E8 sashost:Main+0x15268
00007FFFDB4B59CC 00000074D0DFF810 sashost:Main+0x1B68C
00007FF8449C7034 00000074D0DFF818 KERNEL32:BaseThreadInitThunk+0x14
00007FF8458C26A1 00000074D0DFF848 ntdll:RtlUserThreadStart+0x21

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 1721 views
  • 1 like
  • 4 in conversation