BookmarkSubscribeRSS Feed
declanjohn
Calcite | Level 5

I need help finding coding templates to complete the below tasks in SAS. Thanks in advance.

1) Import the data from air_pollution.csv to SAS (3)

Weather in Beijing

  1. 2)  Produce the frequencies of all values of combined wind direction (cbwd) (from most frequent to least frequent) (2)
  2. 3)  What is the most frequently occurring wind direction? (1)
  3. 4)  Calculate the correlation coefficient between temperature (TEMP) and pressure (PRES) (3)
  4. 5)  Is the correlation positive or negative? What is its strength (very weak/weak/moderate/strong/very strong)? (2)

How does air pollution vary over months?

  1. 6)  Compute the descriptive statistics of pm2_5 by month (3)
  2. 7)  On average, in which month is the pollution level highest, and in which month is it lowest? (2)

Relationship between air pollution and weather

  1. 😎  Build a linear regression model where pm2_5 is a dependent variable, and month, DEWP, TEMP, PRES, cbwd, Iws, Is and Ir are independent variables (8)
  2. 9)  What is the R-squared of this model? Based on the R-squared, does the model fit the data well? (4)
  3. 10)  Create a new variable (high_pm2_5) that takes value 1 if pm2_5 is greater than 150 ug/m^3 and value 0 otherwise (3)

11) Develop a logistic regression model where high_pm2_5 is a dependent variable, and month, DEWP, TEMP, PRES, cbwd, Iws, Is and Ir are independent variables (8)

12) What is the AUC (c statistic) of this model? Based on the AUC, does the model separate high and low pollution levels well? (4)

The worst smog in Beijing

  1. 13)  Create a new data set (avg_air_pollution) and calculate an average pollution level (avg_pm2_5) for each month of each year. Select the variables year, month and avg_pm2_5 (10)
  2. 14)  In which of the 60 analysed months was the average pollution level (avg_pm2_5) highest? This was the worst smog Beijing has experienced for over 50 years (2)
  3. 15)  Based on the new data set, create a pdf report (highest_avg_air_pollution.pdf) that contains the list of those months in which the average pollution level (avg_pm2_5) was greater than 100 ug/m^3. Use the Moonflower style and add a title ("Months With Highest Average Air Pollution") (5)
1 REPLY 1
ChrisHemedinger
Community Manager

Is this a homework assignment or a training exercise?  Have you managed to start any of this (such as, importing the data)?

 

If you're using SAS University Edition, the interface (SAS Studio) provides tasks for most of the challenges that you present.  These tasks generate the SAS code that you can then study/modify to learn from.  You didn't include the pollution data CSV file, so it's difficult to respond with specific code ideas for you.  

 

But, to provide a few hints, for the "Weather in Beijing" section you'll want to use:

  • PROC FREQ with a TABLE statement for cbwd and use the ORDER=FREQ.
  • PROC CORR with the TEMP and PRES variables.

And, of course, you'll need to know something about interpreting the results.  Check the SAS Studio tutorials for more guidance on some of these tasks.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 616 views
  • 2 likes
  • 2 in conversation