I have two questions here; should I sort these states by hand into their proper regions? Additionally; what's the best way to plot it after I sort them?
To provide evidence related to the first inquiry, we might consider how the number of breweries has changed over time. • To provide evidence related to the second inquiry, we might examine the number of breweries in different areas of the United States. For example, we could focus on regions with similar populations, like the three below:
Northeast: New York, Connecticut, Rhode Island, Massachusetts, Vermont, New Hampshire, and Maine.
Southeast: Florida and Georgia.
West: California.
The Excel file contains only three variables: state postal codes (STATE), year (YEAR), and the number of breweries in a state (BREWERIES). The second step in an examination of these queries might involve plotting statistics related to (1) and (2).
For example, we might plot the average number of breweries in the Northeast, Southeast, and West over the last 21 years (2000 through 2020).
Deliverable
To complete this assignment, please plot the average number of breweries in the Northeast, Southeast, and West over the last 21 years (2000 – 2020). Data for all three regions should be presented in a single graph (proc gplot). The graph should be titled “Breweries by Year”, all data points should be connected with lines, and there should be a “legend” showing the variable or region associated with each colored line. Copy and paste your log and the graph (using snip or other options to capture an image are fine) into a Word document and upload through Canvas. Hint: plotting multiple variables in the same graph is referred to as an “overlay”.
Simplest, I think, is 1) add a variable for Region using any of If/then/else or a format
2) summarize the data to get the average number per year (Proc means/ summary with year and region as class variables) into a new data set.
Then something like
Proc sgplot data=newdataset;
series x=year y=mean / group=region;
run;
so this is what i have so far:
data work.new; set work.new;
IF STATE="NY" then REGION="NORTHEAST";
IF STATE="CT" then REGION="NORTHEAST";
IF STATE="RI" then REGION="NORTHEAST";
IF STATE="MA" then REGION="NORTHEAST";
IF STATE="VT" then REGION="NORTHEAST";
IF STATE="ME" then REGION="NORTHEAST";
run;
data work.new; set work.new;
IF STATE="FL" then REGION="SOUTHEAST";
IF STATE="GA" then REGION="SOUTHEAST";
run;
data work.new; set work.new;
IF STATE="CA" then REGION="WEST";
run;
/* Step 2 */
proc sort data=work.new;
by year;
run;
proc means data=work.new;
var breweries;
by year;
output out=avg (drop=_TYPE_ _FREQ_) mean=avg_breweries;
run;
but should i expand on that proc means?
For new programmers I strongly recommend not to use:
Data new; set new;
coding. The result completely replaces your original "new" set and barring syntax errors may result in logic problems that will require going back steps to get your original data back.
You can simplify this to one data step:
data work.new; set work.new; IF STATE="NY" then REGION="NORTHEAST"; IF STATE="CT" then REGION="NORTHEAST"; IF STATE="RI" then REGION="NORTHEAST"; IF STATE="MA" then REGION="NORTHEAST"; IF STATE="VT" then REGION="NORTHEAST"; IF STATE="ME" then REGION="NORTHEAST"; run; data work.new; set work.new; IF STATE="FL" then REGION="SOUTHEAST"; IF STATE="GA" then REGION="SOUTHEAST"; run; data work.new; set work.new; IF STATE="CA" then REGION="WEST"; run;
With
data work.realnew; set work.new; IF STATE in ("NY" "CT" "RI" "MA" "MA" ) then REGION="NORTHEAST"; else IF STATE in ("FL" "GA" )then REGION="SOUTHEAST"; else IF STATE="CA" then REGION="WEST"; run;
The IN operator is like doing a bunch of "if var='this' or var='that' or var='something' statements.
You want to sort the data by REGION and YEAR and use both of those in the BY statement for Proc means. That way you get the average within the region by year.
okay perfect, yeah i usually just do it how he taught us in class, so i don't mind too much but appreciate the simplicity.
the only struggle i am having now is with my gplot, because it's making my data look super weird.
imgur.com/a/qo6hCpF
@krg1140 wrote:
okay perfect, yeah i usually just do it how he taught us in class, so i don't mind too much but appreciate the simplicity.
the only struggle i am having now is with my gplot, because it's making my data look super weird.
imgur.com/a/qo6hCpF
Show the actual plot code you used. Gplot uses different syntax than SGPLOT that I suggested earlier.
It might also help to run Proc Contents on your data set used for plotting and share the results of that so we can see some info about your actual data set.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.