BookmarkSubscribeRSS Feed
art297
Opal | Level 21

It has been a while since I've seen anyone post a weekend programming challenge.  And, since the World Series ended early, I thought some of you might like an interesting challenge.

SAS used to distribute two macros that, together, can be used to conduct decision tree and CHAID (Chi-Square Automatic Interaction Detection) types of analyses.  The macros are still available on a number of sites (see, e.g., http://support.sas.com/kb/25/035.html (for the xmacro) and

http://www.psych.yorku.ca/friendly/lab/files/macros/treedisc.sas  (for the treedisc macro)

The treedisc macro requires SAS/IML to run and, if one desires to print the decision tree, it also requires SAS/OR. However, one can see the results without actually printing the tree, thus SAS/OR isn’t essential for the task.

However, since most sites don't license IML, the challenge is to come up with base SAS (and possibly SAS/STAT) code that can adequately replace the call to IML in the treedisc macro.


8 REPLIES 8
jaredp
Quartz | Level 8

Not having SAS at home means working on this challenge would mean I'd have to work on the weekend.  Smiley Wink

art297
Opal | Level 21

Since I didn't get any response as yet, the challenge has now become an anytime challenge.

Peter_C
Rhodochrosite | Level 12

Art

without the resources to launch the macros provided it is hard to imagine the objectives that the challenger must achieve.

Could you provide a before/after to explain~: inputs, processes and outputs ?

art297
Opal | Level 21

: as I think you already know, I am always interested in discovering ways that one can achieve various analyses without having to purchase expensive addons.  Running CHAID is one of those that I think should be accomplishable via base SAS, but I'm not familiar with IML, thus don't know what the substitutes would be.

What I'd like to run is:

%inc "c:\xmacro.sas";

%inc "c:\treedisc.sas";

%treedisc(data=banksize,depvar=possible_target_bank,

  ordinal=total_customers,  outtree=trd, options=noformat)


That wouldn't require OR, as I don't need to print the tree, but the macro does some of its work in IML.

The challenge is to provide code that accomplishes the same thing, but only using base SAS.

I don't know if one needs the data for this challenge but, if so, it is the data for which a link is provided in a paper I did for MWSUG, namely

Expert Panel Solution MWSUG 2013-Tabachneck - sasCommunity

FriedEgg
SAS Employee

It's not CHAID, but it is ID3/C4.5

proc format;

   value specname

      1='SETOSA    '

      2='VERSICOLOR'

      3='VIRGINICA ';

   value specchar

      1='S'

      2='O'

      3='V';

run;

data iris;

   title 'Fisher (1936) Iris Data';

   input sepallen sepalwid petallen petalwid species @@;

   format species specname.;

   label sepallen='Sepal Length in mm.'

         sepalwid='Sepal Width  in mm.'

         petallen='Petal Length in mm.'

         petalwid='Petal Width  in mm.';

   cards;

50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3

63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2

59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2

65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3

68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3

77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3

49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2

64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3

55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1

49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1

67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1

77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2

50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1

61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1

61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1

51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1

51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1

46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1

50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3

57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1

71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3

49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1

49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1

66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1

44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2

47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2

74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1

56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3

49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1

56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2

51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3

54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3

61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3

68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1

45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1

55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1

51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2

63 33 60 25 3 53 37 15 02 1

;

proc export data=iris outfile='/tmp/iris.csv' dbms=csv replace; run;

proc groovy;

add classpath='/tmp/weka.jar'; *you need to get weka first...;

submit;

import weka.core.Instances

import weka.core.converters.CSVLoader

import weka.classifiers.trees.J48

loader = new CSVLoader()

loader.setSource(new File('/tmp/iris.csv'))

data = loader.getDataSet()

data.setClassIndex(4)

j48 = new J48()

j48.buildClassifier(data)

println j48

endsubmit;

run;

J48 pruned tree

------------------

petalwid <= 6: SETOSA (50.0)

petalwid > 6

|   petalwid <= 17

|   |   petallen <= 49: VERSICOLOR (48.0/1.0)

|   |   petallen > 49

|   |   |   petalwid <= 15: VIRGINICA (3.0)

|   |   |   petalwid > 15: VERSICOLOR (3.0/1.0)

|   petalwid > 17: VIRGINICA (46.0/1.0)

Number of Leaves  : 5

Size of the tree : 9

art297
Opal | Level 21

: Much appreciated!  I was starting to think that no one was going to accept the challenge.

I can't test your code at the moment, but definitely will and compare the result with that which I obtained trom the treedisc macro.

If I'm satisfied with the result, a no one else has provided a better alternative, I'll change your helping rating to a correct answer.

FriedEgg
SAS Employee

I have had you post in the back of my mind since you originally posted.  Figured it was about time I post something...  That being said, I consider my answer a clear cheat on the intention of the challenge.

FloydNevseta
Pyrite | Level 9

This is excellent! FYI... the Fisher iris data is a sample data set in sashelp.iris.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1565 views
  • 6 likes
  • 5 in conversation