Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Coding help

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 01-02-2019 06:22 AM
(875 views)

Need a coding template to perform the below tasks in SAS. Thanks in advance.

1) Import the data from air_pollution.csv to SAS (3)

*Weather in Beijing *

- 2) Produce the frequencies of all values of combined wind direction (cbwd) (from most frequent to least frequent) (2)
- 3) What is the most frequently occurring wind direction? (1)
- 4) Calculate the correlation coefficient between temperature (TEMP) and pressure (PRES) (3)
- 5) Is the correlation positive or negative? What is its strength (very weak/weak/moderate/strong/very strong)? (2)

*How does air pollution vary over months? *

- 6) Compute the descriptive statistics of pm2_5 by month (3)
- 7) On average, in which month is the pollution level highest, and in which month is it lowest? (2)

*Relationship between air pollution and weather *

- 😎 Build a linear regression model where pm2_5 is a dependent variable, and month, DEWP, TEMP, PRES, cbwd, Iws, Is and Ir are independent variables (8)
- 9) What is the R-squared of this model? Based on the R-squared, does the model fit the data well? (4)
- 10) Create a new variable (high_pm2_5) that takes value 1 if pm2_5 is greater than 150 ug/m^3 and value 0 otherwise (3)

11) Develop a logistic regression model where high_pm2_5 is a dependent variable, and month, DEWP, TEMP, PRES, cbwd, Iws, Is and Ir are independent variables (8)

12) What is the AUC (c statistic) of this model? Based on the AUC, does the model separate high and low pollution levels well? (4)

*The worst smog in Beijing *

- 13) Create a new data set (avg_air_pollution) and calculate an average pollution level (avg_pm2_5) for each month of each year. Select the variables year, month and avg_pm2_5 (10)
- 14) In which of the 60 analysed months was the average pollution level (avg_pm2_5) highest? This was the worst smog Beijing has experienced for over 50 years (2)
- 15) Based on the new data set, create a pdf report (highest_avg_air_pollution.pdf) that contains the list of those months in which the average pollution level (avg_pm2_5) was greater than 100 ug/m^3. Use the Moonflower style and add a title ("Months With Highest Average Air Pollution") (5)

1 REPLY 1

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Since this sounds like homework and you wouldn't learn much if provided "a template" (Note: Template refers to a specific procedure in SAS and NOT general purpose code). I will provide some example procedures to get started:

@declanjohn wrote:

Need a coding template to perform the below tasks in SAS. Thanks in advance.

1) Import the data from air_pollution.csv to SAS (3)

Proc Import

Weather in Beijing

- 2) Produce the frequencies of all values of combined wind direction (cbwd) (from most frequent to least frequent) (2)
Proc Freq- 3) What is the most frequently occurring wind direction? (1)
- 4) Calculate the correlation coefficient between temperature (TEMP) and pressure (PRES) (3)
Proc Corr- 5) Is the correlation positive or negative? What is its strength (very weak/weak/moderate/strong/very strong)? (2)

How does air pollution vary over months?

- 6) Compute the descriptive statistics of pm2_5 by month (3)
Proc Means ,BY or Class statement.- 7) On average, in which month is the pollution level highest, and in which month is it lowest? (2)

Relationship between air pollution and weather

- 😎 Build a linear regression model where pm2_5 is a dependent variable, and month, DEWP, TEMP, PRES, cbwd, Iws, Is and Ir are independent variables (8)
Proc Reg, GLM of other regression procedures.- 9) What is the R-squared of this model? Based on the R-squared, does the model fit the data well? (4)
- 10) Create a new variable (high_pm2_5) that takes value 1 if pm2_5 is greater than 150 ug/m^3 and value 0 otherwise (3)
Data step11) Develop a logistic regression model where high_pm2_5 is a dependent variable, and month, DEWP, TEMP, PRES, cbwd, Iws, Is and Ir are independent variables (8)

Proc Logistic12) What is the AUC (c statistic) of this model? Based on the AUC, does the model separate high and low pollution levels well? (4)

The worst smog in Beijing

- 13) Create a new data set (avg_air_pollution) and calculate an average pollution level (avg_pm2_5) for each month of each year. Select the variables year, month and avg_pm2_5 (10)
Proc Means or Summary, Class or By statement- 14) In which of the 60 analysed months was the average pollution level (avg_pm2_5) highest? This was the worst smog Beijing has experienced for over 50 years (2)
- 15) Based on the new data set, create a pdf report (highest_avg_air_pollution.pdf) that contains the list of those months in which the average pollution level (avg_pm2_5) was greater than 100 ug/m^3. Use the Moonflower style and add a title ("Months With Highest Average Air Pollution") (5)
Proc Print, Where statement

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Ready to level-up your skills? Choose your own adventure.