DATA Step, Macro, Functions and more

Weekend Challenge

Reply
PROC Star
Posts: 7,480

Weekend Challenge

It has been a while since I've seen anyone post a weekend programming challenge.  And, since the World Series ended early, I thought some of you might like an interesting challenge.

SAS used to distribute two macros that, together, can be used to conduct decision tree and CHAID (Chi-Square Automatic Interaction Detection) types of analyses.  The macros are still available on a number of sites (see, e.g., http://support.sas.com/kb/25/035.html (for the xmacro) and

http://www.psych.yorku.ca/friendly/lab/files/macros/treedisc.sas  (for the treedisc macro)

The treedisc macro requires SAS/IML to run and, if one desires to print the decision tree, it also requires SAS/OR. However, one can see the results without actually printing the tree, thus SAS/OR isn’t essential for the task.

However, since most sites don't license IML, the challenge is to come up with base SAS (and possibly SAS/STAT) code that can adequately replace the call to IML in the treedisc macro.


Contributor
Posts: 71

Re: Weekend Challenge

Not having SAS at home means working on this challenge would mean I'd have to work on the weekend.  Smiley Wink

PROC Star
Posts: 7,480

Re: Weekend Challenge

Since I didn't get any response as yet, the challenge has now become an anytime challenge.

Valued Guide
Posts: 2,177

Re: Weekend Challenge

Art

without the resources to launch the macros provided it is hard to imagine the objectives that the challenger must achieve.

Could you provide a before/after to explain~: inputs, processes and outputs ?

PROC Star
Posts: 7,480

Re: Weekend Challenge

: as I think you already know, I am always interested in discovering ways that one can achieve various analyses without having to purchase expensive addons.  Running CHAID is one of those that I think should be accomplishable via base SAS, but I'm not familiar with IML, thus don't know what the substitutes would be.

What I'd like to run is:

%inc "c:\xmacro.sas";

%inc "c:\treedisc.sas";

%treedisc(data=banksize,depvar=possible_target_bank,

  ordinal=total_customers,  outtree=trd, options=noformat)


That wouldn't require OR, as I don't need to print the tree, but the macro does some of its work in IML.

The challenge is to provide code that accomplishes the same thing, but only using base SAS.

I don't know if one needs the data for this challenge but, if so, it is the data for which a link is provided in a paper I did for MWSUG, namely

Expert Panel Solution MWSUG 2013-Tabachneck - sasCommunity

Trusted Advisor
Posts: 1,301

Re: Weekend Challenge

It's not CHAID, but it is ID3/C4.5

proc format;

   value specname

      1='SETOSA    '

      2='VERSICOLOR'

      3='VIRGINICA ';

   value specchar

      1='S'

      2='O'

      3='V';

run;

data iris;

   title 'Fisher (1936) Iris Data';

   input sepallen sepalwid petallen petalwid species @@;

   format species specname.;

   label sepallen='Sepal Length in mm.'

         sepalwid='Sepal Width  in mm.'

         petallen='Petal Length in mm.'

         petalwid='Petal Width  in mm.';

   cards;

50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3

63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2

59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2

65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3

68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3

77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3

49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2

64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3

55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1

49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1

67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1

77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2

50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1

61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1

61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1

51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1

51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1

46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1

50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3

57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1

71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3

49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1

49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1

66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1

44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2

47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2

74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1

56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3

49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1

56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2

51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3

54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3

61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3

68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1

45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1

55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1

51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2

63 33 60 25 3 53 37 15 02 1

;

proc export data=iris outfile='/tmp/iris.csv' dbms=csv replace; run;

proc groovy;

add classpath='/tmp/weka.jar'; *you need to get weka first...;

submit;

import weka.core.Instances

import weka.core.converters.CSVLoader

import weka.classifiers.trees.J48

loader = new CSVLoader()

loader.setSource(new File('/tmp/iris.csv'))

data = loader.getDataSet()

data.setClassIndex(4)

j48 = new J48()

j48.buildClassifier(data)

println j48

endsubmit;

run;

J48 pruned tree

------------------

petalwid <= 6: SETOSA (50.0)

petalwid > 6

|   petalwid <= 17

|   |   petallen <= 49: VERSICOLOR (48.0/1.0)

|   |   petallen > 49

|   |   |   petalwid <= 15: VIRGINICA (3.0)

|   |   |   petalwid > 15: VERSICOLOR (3.0/1.0)

|   petalwid > 17: VIRGINICA (46.0/1.0)

Number of Leaves  : 5

Size of the tree : 9

PROC Star
Posts: 7,480

Re: Weekend Challenge

: Much appreciated!  I was starting to think that no one was going to accept the challenge.

I can't test your code at the moment, but definitely will and compare the result with that which I obtained trom the treedisc macro.

If I'm satisfied with the result, a no one else has provided a better alternative, I'll change your helping rating to a correct answer.

Trusted Advisor
Posts: 1,301

Re: Weekend Challenge

I have had you post in the back of my mind since you originally posted.  Figured it was about time I post something...  That being said, I consider my answer a clear cheat on the intention of the challenge.

Frequent Contributor
Posts: 101

Re: Weekend Challenge

This is excellent! FYI... the Fisher iris data is a sample data set in sashelp.iris.

Ask a Question
Discussion stats
  • 8 replies
  • 652 views
  • 6 likes
  • 5 in conversation