BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Usmclee2003
Fluorite | Level 6

Do you think using Text miner in Enterprise miner would be a better route to go?

art297
Opal | Level 21

I doubt if text miner would be your answer but, of course, that all depends on what the question is.

 

The file you posted contains records for two facilities and approximately 20 some records for each facility. How many facilities are in your acual data and what is your ultimate task/goal with the data?

 

The data file, itself, is a classic example of numerous things that can go wrong when using Excel as a data entry tool. The workbook could have been designed better but, unfortunately, the data has already been entered.

 

For someone just reading this thread for the first time, here is a brief summary of the file being analyzed. There are about 20 records for each facility and all of the data reside in a field called comments on each of the 40 records. There are about 40 variables contained in each comment. Each begins with the variable name (with some naming inconsistencies throughout the file), then either a tab character, space, hyphen, parenthesis or dash, then the data, possibly followed by one or more tab characters, and with a line feed character separating each variable. Of course, there might even be more inconsistencies in the actual data file.

 

I've suggested code that addresses most, but not all, of those issues. I, personally, would run the code, see if it could be enhanced to correct even more of the inconsistencies, identify where the remaining inconsistencies exist, manually correct them, and then re-run the code to continue with whatever analysis is needed.

 

HTH,

Art, CEO, AnalystFinder.com

Usmclee2003
Fluorite | Level 6

Art,

 

   This a small sample of the bigger dataset that contains over 1100 wells/facilities.  The data was entered in the database that way.  Looks like they transferred it from somewhere else and did a copy and paste.  That being said point forward it has been corrected but still have a ton of historical data that needs to be cleaned.  I am going to continue writing out the code as before but was needing help with that error where Acid 15% Hcl was being overwritten by the last line that searches for Acid.  I didnt know if you had a suggestion for fixing that error.  Also if anyone wants to look at the second program that converts to numeric and suggest any shortcuts that would be much appreciated as well.

 

 

Thanks

 

art297
Opal | Level 21

Normally, at this point, I'd offer my services (for pay) as a consultant (even though I'm retired from doing consulting).

 

However, I'll try one more time to help you solve this by yourself.

 

Can you provide a list of the variable names of the variables that you want. The variable names on the two sets of records you offered as a sample have a number of major differences between them and, my not knowing what they should be, makes the task of identifying them almost impossible.

 

A list of the ones you need to extract (not the way they are presented in the data) would greatly simplify the process.

 

Art, CEO, AnalystFinder.com

 

Usmclee2003
Fluorite | Level 6

Art,

 

  Thanks.  The variables desired are on the second program i posted the code to.  It takes all the name variations and combines them into one and converts to numeric.  I am forging along with your original code but just ran into that one issue with the '15% Hcl Acid' and 'Acid'.  THe error occurs in the first set of code and where I list out all the variations for 15%Hcl and say place them in the 15% HCL Acid column.  When i list out the variation of just 'Acid'  It grabs the wrong line, I believe' Acid Volume rate 'or something like that.  If you run the code without that variation it runs appropriatley.  Its only when i include that line that it messes up the previous rows that were assigned properly.  No worries if you dont have time.  I'll work with what i have and try to create some work arounds.  Thanks for all your help.  DOnt think that it is not much appreciated because it very much is.

Usmclee2003
Fluorite | Level 6
Correction the variables are on the first program i posted. I have listed the variations and combined them into one column
art297
Opal | Level 21

Try adding the lines to your code that I added to my last version. That is, just after:

do while (scan(comments,_i,'0A'x) ne '');

 

add:

call missing(item);
call missing(value);

 

If you don't initialize your variables, they will carry over from the previous info.

 

Art, CEO, AnalystFinder.com

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 21 replies
  • 2077 views
  • 2 likes
  • 2 in conversation