HPSplit creates a rules.txt file. Below is part of one:
*------------------------------------------------------------*
NODE = 596
*------------------------------------------------------------*
(i_22004 >= 33.324833)
AND (i_Day IS 5)
AND MISSING(i_21106) OR (i_21106 >= -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 1
PREDICTED 1 = 0.8333( 5/6)
PREDICTED 2 = 0.1667( 1/6)
*------------------------------------------------------------*
NODE = 242
*------------------------------------------------------------*
MISSING(i_21705) OR (i_21705 < -6.6666667)
AND (i_21106 < -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 2
PREDICTED 1 = 0.2857( 2/7)
PREDICTED 2 = 0.7143( 5/7)
I'm wondering how to import this whole table into a database. SAS or otherwise.
"NODE" should start a 'record' or 'observation.'
Suggestions greatly appreciated.
Nicholas Kormanik
The following code should give you a start.
/* create hpsplit txt file */
filename hpsplit temp;
data _null_;
file hpsplit;
infile datalines truncover;
input;
put _infile_;
datalines4;
*------------------------------------------------------------*
NODE = 596
*------------------------------------------------------------*
(i_22004 >= 33.324833)
AND (i_Day IS 5)
AND MISSING(i_21106) OR (i_21106 >= -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 1
PREDICTED 1 = 0.8333( 5/6)
PREDICTED 2 = 0.1667( 1/6)
*------------------------------------------------------------*
NODE = 242
*------------------------------------------------------------*
MISSING(i_21705) OR (i_21705 < -6.6666667)
AND (i_21106 < -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 2
PREDICTED 1 = 0.2857( 2/7)
PREDICTED 2 = 0.7143( 5/7)
;;;;
/* read hpsplit text file into SAS table */
data parsed;
length node_id 8 key $32 value $1500;
infile hpsplit truncover;
input str $255.;
retain node_id key value;
if find(substrn(str,1,6),"NODE =",'i') then
do;
node_id=input(scan(strip(str),-1,'= '),best32.);
call missing(key,value);
end;
else
if not missing(node_id) then
do;
if find(str,'*-----','t')=1 then return;
else if find(substrn(str,1,15),'PREDICTED VALUE','i')=1 then output;
if prxmatch('/^PREDICTED (\d+|VA)/oi',str)=1 then
do;
key=catx('_',scan(str,1,' '),scan(str,2,' '));
value=strip(scan(strip(str),-1,'is=','i'));
output;
end;
else
do;
key='Logic';
value=catx(' ',value, str);
end;
end;
run;
proc transpose data=parsed out=want(drop=_:);
by node_id notsorted;
id key;
var value;
run;
1. What's the expected output?
2. These OR clauses look like they should be enclosed in parentheses, shouldn't they?
The idea is to compare the "nodes" HPSplit comes up with. Hard to do that with given output.
A 'record' or 'observation' would include "NODE" through to the following "NODE," non-inclusive of the second NODE. Down to the end of the file.
Probably the "AND" can separate fields.
And "PREDICTED" can separate the final fields.
Basically I'd mostly like to have a suggestion as to what program one would likely use to import into. Like, faced with such a challenge, what would you folks use?
Not, Sorry, Boss, can't be done....
The following code should give you a start.
/* create hpsplit txt file */
filename hpsplit temp;
data _null_;
file hpsplit;
infile datalines truncover;
input;
put _infile_;
datalines4;
*------------------------------------------------------------*
NODE = 596
*------------------------------------------------------------*
(i_22004 >= 33.324833)
AND (i_Day IS 5)
AND MISSING(i_21106) OR (i_21106 >= -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 1
PREDICTED 1 = 0.8333( 5/6)
PREDICTED 2 = 0.1667( 1/6)
*------------------------------------------------------------*
NODE = 242
*------------------------------------------------------------*
MISSING(i_21705) OR (i_21705 < -6.6666667)
AND (i_21106 < -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 2
PREDICTED 1 = 0.2857( 2/7)
PREDICTED 2 = 0.7143( 5/7)
;;;;
/* read hpsplit text file into SAS table */
data parsed;
length node_id 8 key $32 value $1500;
infile hpsplit truncover;
input str $255.;
retain node_id key value;
if find(substrn(str,1,6),"NODE =",'i') then
do;
node_id=input(scan(strip(str),-1,'= '),best32.);
call missing(key,value);
end;
else
if not missing(node_id) then
do;
if find(str,'*-----','t')=1 then return;
else if find(substrn(str,1,15),'PREDICTED VALUE','i')=1 then output;
if prxmatch('/^PREDICTED (\d+|VA)/oi',str)=1 then
do;
key=catx('_',scan(str,1,' '),scan(str,2,' '));
value=strip(scan(strip(str),-1,'is=','i'));
output;
end;
else
do;
key='Logic';
value=catx(' ',value, str);
end;
end;
run;
proc transpose data=parsed out=want(drop=_:);
by node_id notsorted;
id key;
var value;
run;
> Probably the "AND" can separate fields. And "PREDICTED" can separate the final fields.
Like this???
data _null_;
file "%sysfunc(pathname(WORK))\t.txt" ;
input STR $80.;
put STR;
cards;
*------------------------------------------------------------*
NODE = 596
*------------------------------------------------------------*
(i_22004 >= 33.324833)
AND (i_Day IS 5)
AND MISSING(i_21106) OR (i_21106 >= -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 1
PREDICTED 1 = 0.8333( 5/6)
PREDICTED 2 = 0.1667( 1/6)
*------------------------------------------------------------*
NODE = 242
*------------------------------------------------------------*
MISSING(i_21705) OR (i_21705 < -6.6666667)
AND (i_21106 < -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 2
PREDICTED 1 = 0.2857( 2/7)
PREDICTED 2 = 0.7143( 5/7)
run;
data WANT;
infile "%sysfunc(pathname(WORK))\t.txt" end=EOF;
length NODE $4;
array AND[10] $20;
array PRE[10] $20;
retain NODE AND: PRE:;
input ;
if _infile_ =: '*' then return;
if _infile_ =: 'NODE' | EOF then do;
if AND1 ne ' ' then output;
call missing(of AND:, of PRE:, PRENO, ANDNO);
NODE=compress(_INFILE_,,'dk');
end;
else if _infile_ =: 'PRE' then do;
PRENO+1;
PRE[PRENO]=_infile_;
end;
else do;
ANDNO+1;
AND[ANDNO]=_infile_;
end;
run;
NODE | AND1 | AND2 | AND3 | AND4 | AND5 | AND6 | AND7 | AND8 | AND9 | AND10 | PRE1 | PRE2 | PRE3 | PRE4 | PRE5 | PRE6 | PRE7 | PRE8 | PRE9 | PRE10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
596 | (i_22004 >= 33.32483 | AND (i_Day IS 5) | AND MISSING(i_21106) | AND MISSING(i_21305) | AND (i_21605 < -25.5 | AND MISSING(i_20102) | AND MISSING(i_20103) | PREDICTED VALUE IS 1 | PREDICTED 1 = 0.8333 | PREDICTED 2 = 0.1667 | ||||||||||
242 | MISSING(i_21705) OR | AND (i_21106 < -0.23 | AND MISSING(i_21305) | AND (i_21605 < -25.5 | AND MISSING(i_20102) | AND MISSING(i_20103) | PREDICTED VALUE IS 2 | PREDICTED 1 = 0.2857 |
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.