03-31-2018 06:38 PM - edited 03-31-2018 06:42 PM
Hey guys. New here and I have no clue if I'm asking this in the right place but I need help. Bear with me as I try to explain this.
So I'm working on a research project to analyze and rank the most contaminated areas in the state of Georgia. I managed to make a list of all the counties in Georgia that contain 1 or more toxic sites by compiling the data found in Georgia's EPA Hazardous Site Inventory. From this, I made a Top 5 list with which I will need help. So my goal is to analyze and rank which pollutants are most commonly found in the Top 5 county list. I would like to make a Top 10 list of pollutants from this analysis. So to do that, I would have to go through each individual site for all 5 counties and make a list of pollutants. For example, Site A for Cobb County might have 12 pollutants, while Site B has only 5. So Cobb will have 17 total and then I'd move on to the next county. However each pollutant in each site is further categorized by soil or groundwater. So it can be found in soil, groundwater, or both.
Link to Hazardous Site Inventory: http://epd.georgia.gov/sites/epd.georgia.gov/files/2017HSI/10145.pdf
My professor introduced me to someone in the Stats Department who was more knowledgeable in SAS. He was able to draw out a quick sketch of what the table should probably look like. Then he further went on to say that SAS would be the perfect program to crank out this kind of table and using proc freq is the way to go. He said that SAS has the ability to create both of these tables in the sketch. I'm just confused on what the code should be for both of them. I know proc.freq is used for the second one but I think it would need some tweaking that I'm not familiar with. For the code to make the first table to include all the variables, I have no clue on what the code would be.
Edit: Thought it would be more clear to see, but Pol = Pollutant. S and W means Soil and Water (Groundwater), respectively. For S and W, he used 0 and 1 to indicate whether the pollutant was found in soil or groundwater, just to be clear. Kinda like how 0 and 1 is used to indicate Gender.
So I'm assuming I'll have to make an excel spreadsheet containing the sites, counties, pollutants, and then upload this into the SAS library to create the tables, correct?
Sorry if this sounds all over the place. I'm just really confused on how to get started on this. Any help would be REALLY appreciated.
03-31-2018 10:03 PM
Seems like you have been advised to do a 4-way table with Proc freq.
This The FREQ Procedure will be useful to you.
You can use import wizards from the File menu in SAS if you are not confident to write code depending on how your input data file is like (.xls or .csv or .txt or stata or spss or other formats) to read your data into SAS. However, csv is preferred because it will generate the code for you which you can reuse after loading the data into SAS.
04-01-2018 12:15 AM
Sounds like your top table could be compiled with something like this:
proc sql; create table polCounts as select County, Site, Pollutant, sum(matrix="Soil") as nbSoil, sum(matrix="Groundwater") as nbWater from myData group by County, Site, Pollutant; quit;
where myTable would be your input data table with columns County, Site, Pollutant, and Matrix ("Soil" or "Groundwater")
replace sum(matrix="Soil") as nbSoil with sum(matrix="Soil") > 0 as nbSoil if you only want zeros and ones.