BookmarkSubscribeRSS Feed
krg1140
Calcite | Level 5

I have two questions here; should I sort these states by hand into their proper regions? Additionally; what's the best way to plot it after I sort them?

 

To provide evidence related to the first inquiry, we might consider how the number of breweries has changed over time. • To provide evidence related to the second inquiry, we might examine the number of breweries in different areas of the United States. For example, we could focus on regions with similar populations, like the three below:

Northeast: New York, Connecticut, Rhode Island, Massachusetts, Vermont, New Hampshire, and Maine. 

Southeast: Florida and Georgia. 

West: California.

The Excel file contains only three variables: state postal codes (STATE), year (YEAR), and the number of breweries in a state (BREWERIES). The second step in an examination of these queries might involve plotting statistics related to (1) and (2).

For example, we might plot the average number of breweries in the Northeast, Southeast, and West over the last 21 years (2000 through 2020).

 

Deliverable

To complete this assignment, please plot the average number of breweries in the Northeast, Southeast, and West over the last 21 years (2000 – 2020). Data for all three regions should be presented in a single graph (proc gplot). The graph should be titled “Breweries by Year”, all data points should be connected with lines, and there should be a “legend” showing the variable or region associated with each colored line. Copy and paste your log and the graph (using snip or other options to capture an image are fine) into a Word document and upload through Canvas. Hint: plotting multiple variables in the same graph is referred to as an “overlay”.

6 REPLIES 6
ballardw
Super User

Simplest, I think, is 1) add a variable for Region using any of If/then/else or a format

2) summarize the data to get the average number per year (Proc means/ summary with year and region as class variables) into a new data set.

Then something like

 

Proc sgplot data=newdataset;

   series x=year y=mean / group=region;

run;

krg1140
Calcite | Level 5

so this is what i have so far:

data work.new; set work.new;
IF STATE="NY" then REGION="NORTHEAST";
IF STATE="CT" then REGION="NORTHEAST";
IF STATE="RI" then REGION="NORTHEAST";
IF STATE="MA" then REGION="NORTHEAST";
IF STATE="VT" then REGION="NORTHEAST";
IF STATE="ME" then REGION="NORTHEAST";
run;

data work.new; set work.new;
IF STATE="FL" then REGION="SOUTHEAST";
IF STATE="GA" then REGION="SOUTHEAST";
run;

data work.new; set work.new;
IF STATE="CA" then REGION="WEST";
run;

/* Step 2 */
proc sort data=work.new;
by year;
run;

proc means data=work.new;
var breweries;
by year;
output out=avg (drop=_TYPE_ _FREQ_) mean=avg_breweries;
run;

 

but should i expand on that proc means?

ballardw
Super User

For new programmers I strongly recommend not to use:

Data new;
   set new;

coding. The result completely replaces your original "new" set and barring syntax errors may result in logic problems that will require going back steps to get your original data back.

You can simplify this to one data step:

data work.new; set work.new;
IF STATE="NY" then REGION="NORTHEAST";
IF STATE="CT" then REGION="NORTHEAST";
IF STATE="RI" then REGION="NORTHEAST";
IF STATE="MA" then REGION="NORTHEAST";
IF STATE="VT" then REGION="NORTHEAST";
IF STATE="ME" then REGION="NORTHEAST";
run;

data work.new; set work.new;
IF STATE="FL" then REGION="SOUTHEAST";
IF STATE="GA" then REGION="SOUTHEAST";
run;

data work.new; set work.new;
IF STATE="CA" then REGION="WEST";
run;

With

data work.realnew; 
   set work.new;
   IF STATE in ("NY" "CT" "RI" "MA" "MA" ) then REGION="NORTHEAST";
   else IF STATE in ("FL" "GA" )then REGION="SOUTHEAST";
   else IF STATE="CA" then REGION="WEST";
run;

The IN operator is like doing a bunch of "if var='this' or var='that' or var='something' statements.

You want to sort the data by REGION and YEAR and use both of those in the BY statement for Proc means. That way you get the average within the region by year.

krg1140
Calcite | Level 5

okay perfect, yeah i usually just do it how he taught us in class, so i don't mind too much but appreciate the simplicity. 

 

the only struggle i am having now is with my gplot, because it's making my data look super weird. 

 

imgur.com/a/qo6hCpF 

 

Reeza
Super User
I don't think you have your axis set as a date, also assuming you mean SGPLOT.
On your XAXIS statement you can specify the TYPE=TIME and then Interval=Year to get annual dates.
Either way, for some reason it's going to 0 at the start of each year it seems. If you continue to have issues it would be helpful to include your code and embed your image to make it easier for others to assist you.
ballardw
Super User

@krg1140 wrote:

okay perfect, yeah i usually just do it how he taught us in class, so i don't mind too much but appreciate the simplicity. 

 

the only struggle i am having now is with my gplot, because it's making my data look super weird. 

 

imgur.com/a/qo6hCpF 

 


Show the actual plot code you used. Gplot uses different syntax than SGPLOT that I suggested earlier.

It might also help to run Proc Contents on your data set used for plotting and share the results of that so we can see some info about your actual data set.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 789 views
  • 0 likes
  • 3 in conversation