BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
KristiW
Calcite | Level 5

I'm brand new to SAS and in my Data Mining course I have to provide the below.  I am able to do numbers 1 & 2 with no problem but when it comes to number 3 I'm stumped.  None of our text or materials gives any clue other than "The point counts for grid cells can be calculated using a conditional statement such as an IF-THEN statement."  I'm not looking for the exact answer but I've exhausted just about every other resource so I'm hoping to find help here. The point counts for grid cells has to match the image in the Word doc I included.  Any help is appreciated!

 

Complete the following tasks:

  1. Upload the dataset file simulation-uniform.csv from the Module 6 to SAS Studio. The first 200 observations in the dataset were generated from a uniform distribution over a circle centered at (2,3) of radius 2, and the next 100 observations were generated from a uniform distribution over a circle centered at (6,3) of radius 1.
  2. By writing appropriate SAS code, produce a scatterplot of the data. Your plot should look similar to the one shown in Figure 8.10 (page 646) in Introduction to Data Mining.
  3. By writing appropriate SAS code, generate the point counts for grid cells. It should have 49 cells using a 7-by-7 grid. Refer to Table 8.2 (page 646) in Introduction to Data Mining.

Thanks, Kristi

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@KristiW wrote:

@PaigeMiller  The word doc contains an image which this forum would not originally let me post, that is why it is in a word doc.  I was able to include it in this reply though, please see below.  What we write has to match this grid output, I have gotten some of this but the totals are off an any row or column that is all zero's is not being included and I'm not sure how to code to make sure that they appear even though there are only zero's.


"Only zeroes" means that you have to understand what a 0 in proc tabulate means for the N statistic. 0 means that there are no values of that combination of variables. If an entire "row" is 0 means that there were no values of the row heading in the data, i.e MISSING. So the row heading does not appear in the output as it doesn't appear in the input. Luckily, if I understand what you want the solution is also a better way to approach than adding a bunch of if/then else code and creating new variables. Groups created by formats will be honored by Proc Tabulate and other reporting and analysis procedures and generally will for graphs as well. So you can create a custom format that displays the values from the X and Y variables as desired. Plus the Proc Tabulate, and a few other procedures, will allow the PRELOADFMT option to use all of the format values to create the report headings.

Proc format;
value bin
0 -< 1 = '1'
1 -< 2 = '2'
2 -< 3 = '3'
3 -< 4 = '4'
4 -< 5 = '5'
5 -< 6 = '6'
6 -< 7.5= '7'
;

proc tabulate data=work.import;
   class x y/preloadfmt;
   format x y bin.   ;
   table y, 
        xbin *n= ' '
        / printmiss misstext='0'
   ;
run;

In this case the two variables shown appear to be using the same range of values. If your Y needed a different range than X uses then create two formats and associate each with the variable that uses it.

Do note that this is not using the set with any added variables.

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

I don't know what "point counts for grid cells" means. It may be explained in the attachment, but I refuse to download attachments from public forums such as this one. If you could paste the relevant text (not the whole thing unless it is all relevant) from your Word document into your reply, maybe I can assist.

--
Paige Miller
KristiW
Calcite | Level 5

@PaigeMiller  The word doc contains an image which this forum would not originally let me post, that is why it is in a word doc.  I was able to include it in this reply though, please see below.  What we write has to match this grid output, I have gotten some of this but the totals are off an any row or column that is all zero's is not being included and I'm not sure how to code to make sure that they appear even though there are only zero's.

8.2 Point counts for grid cells.png

Here is what I have written so far, I'm sure it is not right but I'm brand new at this so struggling through.  Your help is appreciated!

 

data uniform_bins;
set work.import;

if x >=0 and x<1 then xbin=1;
if x >=1 and x<2 then xbin=2;
if x >=2 and x<3 then xbin=3;
if x >=3 and x<4 then xbin=4;
if x >=4 and x<5 then xbin=5;
if x >=5 and x<6 then xbin=6;
if x >=6 and x<7.5 then xbin=7;
if y >=0 and y<1 then ybin=1;
if y >=1 and y<2 then ybin=2;
if y >=2 and y<3 then ybin=3;
if y >=3 and y<4 then ybin=4;
if y >=4 and y<5 then ybin=5;
if y >=5 and y<6 then ybin=6;
if y >=6 and y<7.5 then ybin=7;
run;

proc tabulate data=uniform_bins;
title "Simulation - Uniform Point Grid";
class ybin;
class xbin;
table ybin, xbin *n= ' '/ printmiss misstext='0';
run;

PaigeMiller
Diamond | Level 26

Here is what I have written so far, I'm sure it is not right but I'm brand new at this so struggling through.

 

Please explain what is wrong with this code, as I don't have your data, so I can't test it. If there is an error in the log, show us the ENTIRE log for this code. If the output is wrong, please show us the incorrect output and explain what is wrong.

--
Paige Miller
ballardw
Super User

@KristiW wrote:

@PaigeMiller  The word doc contains an image which this forum would not originally let me post, that is why it is in a word doc.  I was able to include it in this reply though, please see below.  What we write has to match this grid output, I have gotten some of this but the totals are off an any row or column that is all zero's is not being included and I'm not sure how to code to make sure that they appear even though there are only zero's.


"Only zeroes" means that you have to understand what a 0 in proc tabulate means for the N statistic. 0 means that there are no values of that combination of variables. If an entire "row" is 0 means that there were no values of the row heading in the data, i.e MISSING. So the row heading does not appear in the output as it doesn't appear in the input. Luckily, if I understand what you want the solution is also a better way to approach than adding a bunch of if/then else code and creating new variables. Groups created by formats will be honored by Proc Tabulate and other reporting and analysis procedures and generally will for graphs as well. So you can create a custom format that displays the values from the X and Y variables as desired. Plus the Proc Tabulate, and a few other procedures, will allow the PRELOADFMT option to use all of the format values to create the report headings.

Proc format;
value bin
0 -< 1 = '1'
1 -< 2 = '2'
2 -< 3 = '3'
3 -< 4 = '4'
4 -< 5 = '5'
5 -< 6 = '6'
6 -< 7.5= '7'
;

proc tabulate data=work.import;
   class x y/preloadfmt;
   format x y bin.   ;
   table y, 
        xbin *n= ' '
        / printmiss misstext='0'
   ;
run;

In this case the two variables shown appear to be using the same range of values. If your Y needed a different range than X uses then create two formats and associate each with the variable that uses it.

Do note that this is not using the set with any added variables.

KristiW
Calcite | Level 5

@ballardw Thank you so much!  This appears to get me the closest to what is in my text book.  The values are different but after reviewing the data set I believe they are correct so apparently the data set we have is a bit different from what they used in the text book.  I appreciate your share of knowledge, hopefully I can pay that forward once I'm more proficient in SAS myself.  Have a wonderful New Years! 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1387 views
  • 1 like
  • 3 in conversation