I'm brand new to SAS and in my Data Mining course I have to provide the below. I am able to do numbers 1 & 2 with no problem but when it comes to number 3 I'm stumped. None of our text or materials gives any clue other than "The point counts for grid cells can be calculated using a conditional statement such as an IF-THEN statement." I'm not looking for the exact answer but I've exhausted just about every other resource so I'm hoping to find help here. The point counts for grid cells has to match the image in the Word doc I included. Any help is appreciated!
Complete the following tasks:
Thanks, Kristi
@KristiW wrote:
@PaigeMiller The word doc contains an image which this forum would not originally let me post, that is why it is in a word doc. I was able to include it in this reply though, please see below. What we write has to match this grid output, I have gotten some of this but the totals are off an any row or column that is all zero's is not being included and I'm not sure how to code to make sure that they appear even though there are only zero's.
"Only zeroes" means that you have to understand what a 0 in proc tabulate means for the N statistic. 0 means that there are no values of that combination of variables. If an entire "row" is 0 means that there were no values of the row heading in the data, i.e MISSING. So the row heading does not appear in the output as it doesn't appear in the input. Luckily, if I understand what you want the solution is also a better way to approach than adding a bunch of if/then else code and creating new variables. Groups created by formats will be honored by Proc Tabulate and other reporting and analysis procedures and generally will for graphs as well. So you can create a custom format that displays the values from the X and Y variables as desired. Plus the Proc Tabulate, and a few other procedures, will allow the PRELOADFMT option to use all of the format values to create the report headings.
Proc format; value bin 0 -< 1 = '1' 1 -< 2 = '2' 2 -< 3 = '3' 3 -< 4 = '4' 4 -< 5 = '5' 5 -< 6 = '6' 6 -< 7.5= '7' ; proc tabulate data=work.import; class x y/preloadfmt; format x y bin. ; table y, xbin *n= ' ' / printmiss misstext='0' ; run;
In this case the two variables shown appear to be using the same range of values. If your Y needed a different range than X uses then create two formats and associate each with the variable that uses it.
Do note that this is not using the set with any added variables.
I don't know what "point counts for grid cells" means. It may be explained in the attachment, but I refuse to download attachments from public forums such as this one. If you could paste the relevant text (not the whole thing unless it is all relevant) from your Word document into your reply, maybe I can assist.
@PaigeMiller The word doc contains an image which this forum would not originally let me post, that is why it is in a word doc. I was able to include it in this reply though, please see below. What we write has to match this grid output, I have gotten some of this but the totals are off an any row or column that is all zero's is not being included and I'm not sure how to code to make sure that they appear even though there are only zero's.
Here is what I have written so far, I'm sure it is not right but I'm brand new at this so struggling through. Your help is appreciated!
data uniform_bins;
set work.import;
if x >=0 and x<1 then xbin=1;
if x >=1 and x<2 then xbin=2;
if x >=2 and x<3 then xbin=3;
if x >=3 and x<4 then xbin=4;
if x >=4 and x<5 then xbin=5;
if x >=5 and x<6 then xbin=6;
if x >=6 and x<7.5 then xbin=7;
if y >=0 and y<1 then ybin=1;
if y >=1 and y<2 then ybin=2;
if y >=2 and y<3 then ybin=3;
if y >=3 and y<4 then ybin=4;
if y >=4 and y<5 then ybin=5;
if y >=5 and y<6 then ybin=6;
if y >=6 and y<7.5 then ybin=7;
run;
proc tabulate data=uniform_bins;
title "Simulation - Uniform Point Grid";
class ybin;
class xbin;
table ybin, xbin *n= ' '/ printmiss misstext='0';
run;
Here is what I have written so far, I'm sure it is not right but I'm brand new at this so struggling through.
Please explain what is wrong with this code, as I don't have your data, so I can't test it. If there is an error in the log, show us the ENTIRE log for this code. If the output is wrong, please show us the incorrect output and explain what is wrong.
@KristiW wrote:
@PaigeMiller The word doc contains an image which this forum would not originally let me post, that is why it is in a word doc. I was able to include it in this reply though, please see below. What we write has to match this grid output, I have gotten some of this but the totals are off an any row or column that is all zero's is not being included and I'm not sure how to code to make sure that they appear even though there are only zero's.
"Only zeroes" means that you have to understand what a 0 in proc tabulate means for the N statistic. 0 means that there are no values of that combination of variables. If an entire "row" is 0 means that there were no values of the row heading in the data, i.e MISSING. So the row heading does not appear in the output as it doesn't appear in the input. Luckily, if I understand what you want the solution is also a better way to approach than adding a bunch of if/then else code and creating new variables. Groups created by formats will be honored by Proc Tabulate and other reporting and analysis procedures and generally will for graphs as well. So you can create a custom format that displays the values from the X and Y variables as desired. Plus the Proc Tabulate, and a few other procedures, will allow the PRELOADFMT option to use all of the format values to create the report headings.
Proc format; value bin 0 -< 1 = '1' 1 -< 2 = '2' 2 -< 3 = '3' 3 -< 4 = '4' 4 -< 5 = '5' 5 -< 6 = '6' 6 -< 7.5= '7' ; proc tabulate data=work.import; class x y/preloadfmt; format x y bin. ; table y, xbin *n= ' ' / printmiss misstext='0' ; run;
In this case the two variables shown appear to be using the same range of values. If your Y needed a different range than X uses then create two formats and associate each with the variable that uses it.
Do note that this is not using the set with any added variables.
@ballardw Thank you so much! This appears to get me the closest to what is in my text book. The values are different but after reviewing the data set I believe they are correct so apparently the data set we have is a bit different from what they used in the text book. I appreciate your share of knowledge, hopefully I can pay that forward once I'm more proficient in SAS myself. Have a wonderful New Years!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.