BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

HPSplit creates a rules.txt file.  Below is part of one:

 

 

*------------------------------------------------------------*
NODE = 596
*------------------------------------------------------------*
(i_22004 >= 33.324833)
AND (i_Day IS 5)
AND MISSING(i_21106) OR (i_21106 >= -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 1
PREDICTED 1 = 0.8333( 5/6)
PREDICTED 2 = 0.1667( 1/6)
*------------------------------------------------------------*
NODE = 242
*------------------------------------------------------------*
MISSING(i_21705) OR (i_21705 < -6.6666667)
AND (i_21106 < -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 2
PREDICTED 1 = 0.2857( 2/7)
PREDICTED 2 = 0.7143( 5/7)

 

 

I'm wondering how to import this whole table into a database.  SAS or otherwise.

 

"NODE" should start a 'record' or 'observation.'

 

Suggestions greatly appreciated.

 

Nicholas Kormanik

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@NKormanik 

The following code should give you a start.

/* create hpsplit txt file */
filename hpsplit temp;
data _null_;
  file hpsplit;
  infile datalines truncover;
  input;
  put _infile_;
  datalines4;
*------------------------------------------------------------*
NODE = 596
*------------------------------------------------------------*
(i_22004 >= 33.324833)
AND (i_Day IS 5)
AND MISSING(i_21106) OR (i_21106 >= -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 1
PREDICTED 1 = 0.8333( 5/6)
PREDICTED 2 = 0.1667( 1/6)
*------------------------------------------------------------*
NODE = 242
*------------------------------------------------------------*
MISSING(i_21705) OR (i_21705 < -6.6666667)
AND (i_21106 < -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 2
PREDICTED 1 = 0.2857( 2/7)
PREDICTED 2 = 0.7143( 5/7)
 
;;;;

/* read hpsplit text file into SAS table */
data parsed;
  length node_id 8 key $32 value $1500;
  infile hpsplit truncover;
  input str $255.;
  retain node_id key value;

  if find(substrn(str,1,6),"NODE =",'i') then 
    do;
      node_id=input(scan(strip(str),-1,'= '),best32.);
      call missing(key,value);
    end;
  else
  if not missing(node_id) then
    do;
      if find(str,'*-----','t')=1 then return;
      else if find(substrn(str,1,15),'PREDICTED VALUE','i')=1 then output;
      if prxmatch('/^PREDICTED (\d+|VA)/oi',str)=1 then
        do;
          key=catx('_',scan(str,1,' '),scan(str,2,' '));
          value=strip(scan(strip(str),-1,'is=','i'));
          output;
        end;
      else
        do;
          key='Logic';
          value=catx(' ',value, str);
        end;
    end;
run;

proc transpose data=parsed out=want(drop=_:);
  by node_id notsorted;
  id key;
  var value;
run;

Capture.JPG

View solution in original post

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

1. What's the expected output?

2. These OR clauses look like they should be enclosed in parentheses, shouldn't they?

NKormanik
Barite | Level 11

The idea is to compare the "nodes" HPSplit comes up with.  Hard to do that with given output.

 

A 'record' or 'observation' would include "NODE" through to the following "NODE," non-inclusive of the second NODE.  Down to the end of the file.

 

Probably the "AND" can separate fields.

 

And "PREDICTED" can separate the final fields.

 

Basically I'd mostly like to have a suggestion as to what program one would likely use to import into.  Like, faced with such a challenge, what would you folks use?

 

Not, Sorry, Boss, can't be done....

 

 

Patrick
Opal | Level 21

@NKormanik 

The following code should give you a start.

/* create hpsplit txt file */
filename hpsplit temp;
data _null_;
  file hpsplit;
  infile datalines truncover;
  input;
  put _infile_;
  datalines4;
*------------------------------------------------------------*
NODE = 596
*------------------------------------------------------------*
(i_22004 >= 33.324833)
AND (i_Day IS 5)
AND MISSING(i_21106) OR (i_21106 >= -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 1
PREDICTED 1 = 0.8333( 5/6)
PREDICTED 2 = 0.1667( 1/6)
*------------------------------------------------------------*
NODE = 242
*------------------------------------------------------------*
MISSING(i_21705) OR (i_21705 < -6.6666667)
AND (i_21106 < -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 2
PREDICTED 1 = 0.2857( 2/7)
PREDICTED 2 = 0.7143( 5/7)
 
;;;;

/* read hpsplit text file into SAS table */
data parsed;
  length node_id 8 key $32 value $1500;
  infile hpsplit truncover;
  input str $255.;
  retain node_id key value;

  if find(substrn(str,1,6),"NODE =",'i') then 
    do;
      node_id=input(scan(strip(str),-1,'= '),best32.);
      call missing(key,value);
    end;
  else
  if not missing(node_id) then
    do;
      if find(str,'*-----','t')=1 then return;
      else if find(substrn(str,1,15),'PREDICTED VALUE','i')=1 then output;
      if prxmatch('/^PREDICTED (\d+|VA)/oi',str)=1 then
        do;
          key=catx('_',scan(str,1,' '),scan(str,2,' '));
          value=strip(scan(strip(str),-1,'is=','i'));
          output;
        end;
      else
        do;
          key='Logic';
          value=catx(' ',value, str);
        end;
    end;
run;

proc transpose data=parsed out=want(drop=_:);
  by node_id notsorted;
  id key;
  var value;
run;

Capture.JPG

ChrisNZ
Tourmaline | Level 20

> Probably the "AND" can separate fields. And "PREDICTED" can separate the final fields.

 

Like this???

data _null_;
  file "%sysfunc(pathname(WORK))\t.txt" ;
  input STR $80.;
  put STR;
cards;
*------------------------------------------------------------*
NODE = 596
*------------------------------------------------------------*
(i_22004 >= 33.324833)
AND (i_Day IS 5)
AND MISSING(i_21106) OR (i_21106 >= -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 1
PREDICTED 1 = 0.8333( 5/6)
PREDICTED 2 = 0.1667( 1/6)
*------------------------------------------------------------*
NODE = 242
*------------------------------------------------------------*
MISSING(i_21705) OR (i_21705 < -6.6666667)
AND (i_21106 < -0.23923333)
AND MISSING(i_21305) OR (i_21305 < -66.6667)
AND (i_21605 < -25.562867)
AND MISSING(i_20102) OR (i_20102 IS 1)
AND MISSING(i_20103) OR (i_20103 IS 0)
PREDICTED VALUE IS 2
PREDICTED 1 = 0.2857( 2/7)
PREDICTED 2 = 0.7143( 5/7)
run;
data WANT;
  infile "%sysfunc(pathname(WORK))\t.txt" end=EOF;
  length NODE $4; 
  array AND[10] $20;
  array PRE[10] $20;
  retain NODE AND: PRE:;
  input ;
  if _infile_ =: '*' then return;
  if _infile_ =: 'NODE' | EOF then do;
    if AND1 ne ' ' then output;
    call missing(of AND:, of PRE:, PRENO, ANDNO);
    NODE=compress(_INFILE_,,'dk');
  end;
  else if _infile_ =: 'PRE' then do;
    PRENO+1;
    PRE[PRENO]=_infile_;
  end;
  else do;
    ANDNO+1;
    AND[ANDNO]=_infile_;
  end;
run;

NODE AND1 AND2 AND3 AND4 AND5 AND6 AND7 AND8 AND9 AND10 PRE1 PRE2 PRE3 PRE4 PRE5 PRE6 PRE7 PRE8 PRE9 PRE10
596 (i_22004 >= 33.32483 AND (i_Day IS 5) AND MISSING(i_21106) AND MISSING(i_21305) AND (i_21605 < -25.5 AND MISSING(i_20102) AND MISSING(i_20103)       PREDICTED VALUE IS 1 PREDICTED 1 = 0.8333 PREDICTED 2 = 0.1667              
242 MISSING(i_21705) OR AND (i_21106 < -0.23 AND MISSING(i_21305) AND (i_21605 < -25.5 AND MISSING(i_20102) AND MISSING(i_20103)         PREDICTED VALUE IS 2 PREDICTED 1 = 0.2857                

 

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 653 views
  • 5 likes
  • 3 in conversation