BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Indescribled
Obsidian | Level 7

I am trying to make an area plot in SAS. At the bottom is an image of the plot in question from R. 

 

The data I am using comes from Tidy Tuesday for this week - https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-09-28/readme.md

Here is my reproducible code to get the data to the point of plotting, but the plot is not at all what I had intended and I am having trouble finding resources about SAS area plots so I am hoping someone here can assist. I would prefer proc sgplot, but any way to accomplish it is fine as long as it is not too complicated. 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-09-28/papers.csv";

proc import out=papers datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
run;

* Get data 2;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-09-28/programs.csv";

proc import out=programs datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
run;

* Get data 3;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-09-28/paper_authors.csv";

proc import out=paper_authors datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
run;

* Get data 4;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-09-28/paper_programs.csv";

proc import out=paper_programs datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
run;


* Sort before merging;
proc sort data=papers;
	by paper;
run;

proc sort data = paper_authors;
	by paper;
run;

proc sort data = paper_programs;
	by paper;
run;

* Merge;
data joined_df;
	merge papers paper_authors paper_programs;
	by paper;
run;

* Sort again;
proc sort data=joined_df;
	by program;
run;

proc sort data = programs;
	by program;
run;

* Merge 2;
data joined_df_2;
	merge joined_df programs;
	by program;
run;

* Count;
proc freq data=joined_df_2;
tables year * program_category / out=summary;
run;

* Checking how the filter works and data looks;
proc print data=summary;
	where percent > 0 AND program_category ^= 'NA';
run;

Below is my attempt to plot it, but it is not at all what I intended. 

* Plot attempt;
proc gplot data=summary;
	where percent > 0 AND program_category ^= 'NA';
	plot count*year / overlay
								 areas = 3;
run;
quit;

Here is an example of the plot I want to make. This is from R. 

Indescribled_0-1632974363169.png

 

1 ACCEPTED SOLUTION
5 REPLIES 5
ChrisHemedinger
Community Manager

I'm enjoying your #tidytuesday exercises. It's a great way to take what you know from one system (R) and see how you can apply it to something different (SAS).

 

Some suggestions:

  • Rather than sort each data set then merge, you can do this all in one step with PROC SQL. 
  • year and month can be combined to make a real SAS date, opening the door to more time series reporting/plots. You can still group a report by year by applying the YEAR format. With a one-word syntax change you could change that to grouping by month, by quarter, etc. -- all using SAS formats.
  • Use SG graphics (SGPLOT procedure)) instead of GPLOT. The syntax is much more elegant and declarative in the way you might be accustomed to from R packages like ggplot. This tutorial is a good place to learn SG procedures.

Not an area plot, but here's my code for the basics with these suggestions.

 

proc sql;
   create table work.paperdata as 
   select (mdy(t1.month,1,t1.year)) format=monyy7. as date, 
          t1.paper, 
          t1.title, 
          t4.author, 
          t2.program, 
          t2.program_desc, 
          t2.program_category
      from work.papers t1
           left join work.paper_programs t3 on (t1.paper = t3.paper)
           inner join work.programs t2 on (t3.program = t2.program)
           inner join work.paper_authors t4 on (t1.paper = t4.paper);
quit;

proc freq data=paperdata noprint;
format date year.;
tables date * program_category / out=summary;
run;

proc print data=summary;
	where percent > 0 AND program_category ^= 'NA';
run;

proc sgplot data=summary;
format date year.;
where program_category ^= 'NA';
series x=date y=count / group=program_category;
yaxis label= "papers";
xaxis minor label="Year";
run;

 

 

ChrisHemedinger_0-1633003622396.png

 

SAS Hackathon registration is open! Build your skills. Make connections. Enjoy creative freedom. Maybe change the world.
ChrisHemedinger
Community Manager

One more thing. You don't need to use PROC FREQ to summarize the data first if you're happy to let PROC SGPLOT do that for you. Continuing with my example from the original detailed data -- I used VBAR with stat=FREQ to summarize, and a time x-axis that helps SGPLOT to space the tick marks properly:

ods graphics / width=800 height=400;
proc sgplot data=paperdata;
format date year.;
where program_category ^= 'NA';
vbar date / stat=freq  group=program_category;
yaxis label= "papers";
xaxis minor label="Year" type=time;
run;

 

ChrisHemedinger_0-1633004360038.png

 

SAS Hackathon registration is open! Build your skills. Make connections. Enjoy creative freedom. Maybe change the world.
acordes
Rhodochrosite | Level 12
proc sort data=summary;
by year program_category;
run;

data summary2;
set summary;
where percent > 0 AND program_category not in ("" 'NA');
by year program_category;
retain _lag_upper;

_lag_count=lag(count);

if first.year then do;
_upper=count;
_lower=0;
_lag_upper=_upper;
end;

if ^first.year then do;
_lower=_lag_upper;
_upper=count+_lower;
_lag_upper=_upper;
end;

run;

ods graphics on;
proc sgplot data=WORK.SUMMARY2;
where percent > 0 AND program_category not in ("" 'NA');
band x=year lower=_lower upper=_upper / group= program_category transparency=0.6  ;
run;

a1.png

 

acordes
Rhodochrosite | Level 12

enhanced version

data summary2;
set summary;
where percent > 0 AND program_category not in ("" 'NA');
by year program_category;
retain _lag_upper;

_lag_count=lag(count);

if first.year then do;
_upper=count;
_lower=0;
_lag_upper=_upper;
end;

if ^first.year then do;
_lower=_lag_upper;
_upper=count+_lower;
_lag_upper=_upper;
end;

_text=count;

textgroup="show";
_ytext=(_upper+_lower)/2;
if mod(year,3) <>0 and year not in (2021) then do;
call missing(_text, textgroup);
end;

run;

ods graphics on / imagemap;
proc sgplot data=WORK.SUMMARY2;
where percent > 0 AND program_category not in ("" 'NA');
band x=year lower=_lower upper=_upper / group= program_category transparency=0.6  ;
text y=_ytext text=_text x=year / textattrs=(color=black) nomissinggroup group=textgroup tip=(count year) ;
xaxis thresholdmax=0.05;
run;

a1.png

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1390 views
  • 9 likes
  • 4 in conversation