BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Zachary
Obsidian | Level 7

I am having some trouble running the Code Node in Enterprise Miner to bucket a continuous variable:

proc format;

  value F_TOTALRESERVES_B 0 -< 300 = 1

                          300 -< 1500 = 2

                          1500 -< 10000 = 3

                          10000 - high = 4;

run;

data SRS_TEMPCOMMENTS100000;

  set SRS_TEMPCOMMENTS100000;

  TOTALRESERVES_B = put(TOTALRESERVES, F_TOTALRESERVES_B.);

run;

I think the proc format runs, but the next steps do not. I basically wish to group my TOTALRESERVES variable into four buckets. Perhaps I cannot reference the table TEMPCOMMENTS100000 as I did? Also, in the  past I have just run this in Enterprise Guide - I wish to keep everything in Enterprise Miner if possible.

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
rayIII
SAS Employee

Hi,  Zachary.

Assuming that srs_tempcomments100000 is your training dataset, you can do it by in a SAS code node using  the &em_import_data and &em_export_train macro variables.

Note: I used the home equity data as my training set and DEBTINC as the variable to bin. But I think you'll get the idea.

proc format;

  value F_TOTALRESERVES_B

                          0 -< 10 = 1

                          10 -< 20 = 2

                          20 -< 30 = 3

                          30 - high = 4;

run;

data &em_export_train;

  set &em_import_data;

  DE_bucketed = put(debtinc, F_TOTALRESERVES_B.);

run;

%EM_METACHANGE(

      NAME=   DE_bucketed,

      LEVEL= ORDINAL                       

);

You should now be able to hook up your SAS code node to say, a Decision Tree node, which will use the bucketed variable as an ordinal input.

If you prefer, you can use a Metadata node instead of using the %em_metachange macro.

Hope this helps.

Ray

View solution in original post

6 REPLIES 6
rayIII
SAS Employee

Hi,  Zachary.

Assuming that srs_tempcomments100000 is your training dataset, you can do it by in a SAS code node using  the &em_import_data and &em_export_train macro variables.

Note: I used the home equity data as my training set and DEBTINC as the variable to bin. But I think you'll get the idea.

proc format;

  value F_TOTALRESERVES_B

                          0 -< 10 = 1

                          10 -< 20 = 2

                          20 -< 30 = 3

                          30 - high = 4;

run;

data &em_export_train;

  set &em_import_data;

  DE_bucketed = put(debtinc, F_TOTALRESERVES_B.);

run;

%EM_METACHANGE(

      NAME=   DE_bucketed,

      LEVEL= ORDINAL                       

);

You should now be able to hook up your SAS code node to say, a Decision Tree node, which will use the bucketed variable as an ordinal input.

If you prefer, you can use a Metadata node instead of using the %em_metachange macro.

Hope this helps.

Ray

Zachary
Obsidian | Level 7

Thank you so much for the answer. I was sort of barking up the right tree but did not have the %EM_METACHANGE part.

I will be using this with a Text Rule Builder Node. My question is after I run the Code Node I want to see my variables but it is not there. Will it be there by the time I get past a few other nodes (Text Parsing, Text Filter) to get to the Text Rule Builder Node? I guess I am a little worried because I do not see it in my variable listing.

rayIII
SAS Employee

You're welcome, Zachary.

The bucketed variable is definitely 'exported' and should be visible in subsequent nodes.

That said, I just tried connecting the SAS Code node to the Text Parsing node (which, as you mention, must precede the TRB node), and it worked but only after I created a character variable ("my_long_text") with a role of TEXT.

Text Parsing Node Variables.png

Until I did that, none of the variables showed up and the node would fail because the data requirements were not being met for the Text Parsing node.

Zachary
Obsidian | Level 7

Thank you again Ray.

Unfortunately I am now a little more confused:

My SAS Code node is between my original data file and my Text Parsing node. Below is a picture of the original variables I have:

Img1.JPG

And the following is what my path looks like:

Img2.JPG

The variables that reside within my Text Parsing node are:

Img3.JPG

So it seems all it brought over was the text field. But the following shows what variables occur in the Text Filter node:

Img4.JPG

So it does have the new ordinal variable that I created with your guidance:

proc format;

  value F_TOTALRESERVES_B

                          0 -< 300 = 1

                          300 -< 1500 = 2

                          1500 -< 10000 = 3

                          10000 - high = 4;

run;

data &em_export_train;

  set &em_import_data;

  TOTALRESERVES_B = put(TOTALRESERVES, F_TOTALRESERVES_B.);

run;

%EM_METACHANGE(

  NAME = TOTALRESERVES_B,

  LEVEL = ORDINAL,

  ROLE = TARGET                 

);

I did have to make one change or addition. I not only said LEVEL = ORDINAL but I also included ROLE = TARGET. Is that possibly eliminating the other variables?

Overall I think it would be best to bring over all of the variables. But I am also concerned that when I originally ran the Text Rule Builder it did not converge after one hour. As background I have about 100,000 text documents, the three Train parameters in the Text Rule Builder node were set to Very Low, and the Minimum Number of Documents in the Text Filter node was set at 20. The next time I run it I think I should set things at Medium with the minimum number of documents set to 200. Any other suggestions are welcome. I am enjoying learning all of this.

Thank you again.

rayIII
SAS Employee

Hi, Zachary.  Things might have gotten out of synch on my end. When I tried this example with the sampsio.news data, I see what you do in the TB and TF nodes.

Getting Started with SAS(R) Text Miner 12.1

If you are able to run the TRB node, I think you are good to go regarding variable roles. I hadn't used it before, but my understanding is that the TRB node just wants a target and a text variable, and ignores inputs. (The inputs are not actually dropped, though--you should still see them when you look at the data exported from the TRB node.)

If you are still having troubles with convergence, try checking with the SAS Text Mining community. The text mining experts there will be better able to help you.

Good luck.

Ray

Zachary
Obsidian | Level 7

Thank you very much. I got it to converge after a few hours last night. Thank you for everything.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1274 views
  • 6 likes
  • 2 in conversation