Hi all,
I am new to SAS (7.1 Enterprise Guide, 64 bit) and have some unstructed data that was originally stored in a database and looks like this
When I use enterpris guide and bring the data in from the library it shows up like this:
When i expand the column it turns into this
My desired out put is
I wont go into what ive tried since i am a beginner and it was really basic code. I have attaced a excel file with example data so everyone can work it on there machine if desired. This is my first post so if Violated etiquette please let me know. Always trying to get better.
Thanks
If it turns out that you only want/need the results as you showed them in your example, adding a code node with the following should suffice (assuming, that is, your file is called work.have and you want to end up with a file called work.want):
data want;
set have;
array var(39) $80.;
_i=1;
do while (scan(comments,_i,'0A'x) ne '');
var(_i)=scan(comments,_i,'0A'x);
_i+1;
end;
run;
Art, CEO, AnalystFinder.com
Your comment column contains all of the desired data fields, separated by Line Feeds ('0A'x or '0D'x depending upon your computer's operating system).
Instead of var1,var2, etc. I'd think you'd rather end up with variables like date, start_time, etc.
Which do you prefer?
Art, CEO, AnalystFinder.com
Yes you are correct that would be the ideal situation. I was hoping to get any help and didnt want to ask for too much
What also may be very much appreciated is to have the output look like
If it turns out that you only want/need the results as you showed them in your example, adding a code node with the following should suffice (assuming, that is, your file is called work.have and you want to end up with a file called work.want):
data want;
set have;
array var(39) $80.;
_i=1;
do while (scan(comments,_i,'0A'x) ne '');
var(_i)=scan(comments,_i,'0A'x);
_i+1;
end;
run;
Art, CEO, AnalystFinder.com
I used the code below and got one error - Array subscript out of range at line 28 column1
data work.comments;
set work.testdata;
array var(39) $80.;
_i=1;
do while (scan(comments,_i,'0A'x) ne '');
var(_i)=scan(comments,_i,'0A'x);
_i+1;
end;
run;
Log
The SAS System 10:13 Thursday, January 5, 2017
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Program (6)';
4 %LET _CLIENTPROJECTPATH='O:\permanent\PTIM\DataScience\Advanced Analytics Team\Lee
4 ! Projects\TextStringExtractionTest1.egp';
5 %LET _CLIENTPROJECTNAME='TextStringExtractionTest1.egp';
6 %LET _SASPROGRAMFILE=;
7
8 ODS _ALL_ CLOSE;
9 OPTIONS DEV=ACTIVEX;
10 GOPTIONS XPIXELS=0 YPIXELS=0;
11 FILENAME EGSR TEMP;
12 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
13 STYLE=HtmlBlue
14 STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")
15 NOGTITLE
16 NOGFOOTNOTE
17 GPATH=&sasworklocation
18 ENCODING=UTF8
19 options(rolap="on")
20 ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
21
22 GOPTIONS ACCESSIBLE;
23 data work.comments;
24 set work.testdata;
25 array var(39)
NOTE: The array var has the same name as a SAS-supplied or user-defined function. Parentheses following this name are treated as
array references and not function references.
25 ! $80.;
26 _i=1;
27 do while (scan(comments,_i,'0A'x) ne '');
28 var(_i)=scan(comments,_i,'0A'x);
29 _i+1;
30 end;
31 run;
ERROR: Array subscript out of range at line 28 column 1.
API_IPO_UWI=4201334737 MOC_UWI=420133473700 SUB_ASSET=PLEASANTON ASSET=EAGLE FORD WELL_NAME=74 RANCH SPECIAL UNIT EAST 2H
MOC_WELL_NAME=74 RANCH SPECIAL UNIT EAST 2H
COMMENTS=Stage Number 1Date 13-Sep-13 Start Time 10:43 End Time 13:28 Pump Time 2:45 hr:min
Initial Wellhead Pressure 4,452 psi Max Treating Pressure 9,493 psi Avg Treating Pressure 8,686 psi Max Slurry Rat
e 54.8 bpm Avg Slurry Rate 50.1 bpm Avg HHP 10,662 hp 15% HCl Acid 6,010 gal 143 bblFR-66 Water 48,790 gal 1,
162 bbl25# Waterfrac (19 cP) 11,866 gal 283 bbl25# Hybor G (19 cP) 350,691 gal 8,350 bblClean Volume 417,357 gal 9,937 bblS
lurry Volume 424,979 gal 10,119 bblPulsed Slurry Volume 21,215 gal 505 bblMax Proppant Conc. 3.07 lb/gal BH Max Proppant C
onc. 3.07 lb/gal 30/50 Premium White 87,620 lb 876.20 sack20/40 Premium White 80,640 lb 806.40 sackSandwedge ABC Plus Mass
Coated 87,620 lb 876.20 sackSandwedge ABC Max Mass Coated 80,640 lb 806.40 sackProppant Mass Pumped 168,260 lb 1,682.60 sackPr
oppant Mass Counted 167,185 lb 1,671.85 sackBreakdown Pressure N/A psi Rate @ Breakdown N/A bpm Ball on Seat N/A bbls N/
A galsAcid Pressure Drop 1,018 psi STP Before Acid 9301 psi STP After Acid 8283 psi Final ISIP 6,228 psi
Final FG 0.994 psi/ft 5 Min. Shut-In 5,030 psi 240 psi/minPad Volume 124,099 gal 2,955 bblXL Pad Volume 112,233 gal 2,
672 bblClean Volume for PLF 259,992 gal 6,190 bblProppant Pumped 99 % STAGE_FRAC_TYPE= INTERVAL_NUM=1
var1=Stage Number 1 var2=Date 13-Sep-13 var3= var4=Start Time 10:43 var5=End Time 13:28
var6=Pump Time 2:45 hr:min var7=Initial Wellhead Pressure 4,452 psi var8=Max Treating Pressure 9,493 psi
var9=Avg Treating Pressure 8,686 psi var10=Max Slurry Rate 54.8 bpm var11=Avg Slurry Rate 50.1 bpm
var12=Avg HHP 10,662 hp var13=15% HCl Acid 6,010 gal 143 bbl var14=FR-66 Water 48,790 gal 1,162 bbl
var15=25# Waterfrac (19 cP) 11,866 gal 283 bbl var16=25# Hybor G (19 cP) 350,691 gal 8,350 bbl
var17=Clean Volume 417,357 gal 9,937 bbl var18=Slurry Volume 424,979 gal 10,119 bbl
var19=Pulsed Slurry Volume 21,215 gal 505 bbl var20=Max Proppant Conc. 3.07 lb/gal
2 The SAS System 10:13 Thursday, January 5, 2017
var21=BH Max Proppant Conc. 3.07 lb/gal var22=30/50 Premium White 87,620 lb 876.20 sack
var23=20/40 Premium White 80,640 lb 806.40 sack var24=Sandwedge ABC Plus Mass Coated 87,620 lb 876.20 sack
var25=Sandwedge ABC Max Mass Coated 80,640 lb 806.40 sack var26=Proppant Mass Pumped 168,260 lb 1,682.60 sack
var27=Proppant Mass Counted 167,185 lb 1,671.85 sack var28=Breakdown Pressure N/A psi var29=Rate @ Breakdown N/A bpm
var30=Ball on Seat N/A bbls N/A gals var31=Acid Pressure Drop 1,018 psi var32=STP Before Acid 9301 psi
var33=STP After Acid 8283 psi var34=Final ISIP 6,228 psi var35=Final FG 0.994 psi/ft
var36=5 Min. Shut-In 5,030 psi 240 psi/min var37=Pad Volume 124,099 gal 2,955 bbl var38=XL Pad Volume 112,233 gal 2,672 bbl
var39=Clean Volume for PLF 259,992 gal 6,190 bbl _i=40 _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.TESTDATA.
WARNING: The data set WORK.COMMENTS may be incomplete. When this step was stopped there were 0 observations and 49 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
memory 1084.21k
OS Memory 20904.00k
Timestamp 01/05/2017 05:45:15 PM
Step Count 46 Switch Count 126
Page Faults 0
Page Reclaims 445
Page Swaps 0
Voluntary Context Switches 508
Involuntary Context Switches 2
Block Input Operations 0
Block Output Operations 280
32
33 GOPTIONS NOACCESSIBLE;
34 %LET _CLIENTTASKLABEL=;
35 %LET _CLIENTPROJECTPATH=;
36 %LET _CLIENTPROJECTNAME=;
37 %LET _SASPROGRAMFILE=;
38
39 ;*';*";*/;quit;run;
40 ODS _ALL_ CLOSE;
41
42
43 QUIT; RUN;
44
Then your actual workbook contains more than 39 fields in the comment section. Increase the statement '
array var(39) $80.;
to something like:
array var(45) $80.;
Art, CEO, AnalystFinder.com
Art 297 that did the trick, This is an awesome forum, thanks so much, i worked on this for over 8 hours today, I cant believe This worked out so well. Thanks again. The quick responses are very much appreciated.
I don't have time to write the code for you but, if you want your data in the other format we discussed, here is a start to the code:
data want;
set have;
length _var $80;
_i=1;
do while (scan(comments,_i,'0A'x) ne '');
_var=strip(translate(strip(scan(comments,_i,'0A'x)),'','09'x));
if substr(_var,1,12) eq 'Stage Number' then Stage_Number=substr(_var,13);
else if substr(_var,1,4) eq 'Date' then Date=substr(_var,5);
else if substr(_var,1,10) eq 'Start Time' then Start_Time=substr(_var,11);
else if substr(_var,1,8) eq 'End Time' then End_Time=substr(_var,9);
else if substr(_var,1,9) eq 'Pump Time' then Pump_Time=substr(_var,10);
else if substr(_var,1,25) eq 'Initial Wellhead Pressure' then Initial_Wellhead_Pressure=substr(_var,26);
else if substr(_var,1,21) eq 'Max Treating Pressure' then Max_Treating_Pressure=substr(_var,22);
else if substr(_var,1,21) eq 'Avg Treating Pressure' then Avg_Treating_Pressure=substr(_var,22);
else if substr(_var,1,15) eq 'Max Slurry Rate' then Max_Slurry_Rate=substr(_var,16);
else if substr(_var,1,15) eq 'Avg Slurry Rate' then Avg_Slurry_Rate=substr(_var,16);
else if substr(_var,1,7) eq 'Avg HHP' then Avg_HHP=substr(_var,8);
else if substr(_var,1,12) eq '15% HCl Acid' then Fifteen_Percent_HCl_Acid=substr(_var,13);
else if substr(_var,1,11) eq 'FR-66 Water' then FR_66_Water=substr(_var,12);
else if substr(_var,1,21) eq '25# Waterfrac (19 cP)' then _25_Pound_Waterfrac_19_cP=substr(_var,22);
_i+1;
end;
run;
HTH,
Art, CEO, AnalystFinder.com
Art I continued on with your code and ran into an issue i could not resolve.
else if substr(_var,1,12) eq '15% HCl Acid' then Acid_15_Perc=substr(_var,13);
else if substr(_var,1,7) eq '15% HCl' then Acid_15_Perc=substr(_var,8);
else if substr(_var,1,11) eq 'Acid (bbls)' then Acid_15_Perc=substr(_var,13);
else if substr(_var,1,10) eq 'Acid(bbls)' then Acid_15_Perc=substr(_var,11);
else if substr(_var,1,5) eq 'Acid:' then Acid_15_Perc=substr(_var,6);
else if substr(_var,1,4) eq 'Acid' then Acid_15_Perc=substr(_var,5);
The last line takes over for the two prevoius lines because they contain the word Acid in them. Thus it cuts the data strings off at 5 for all 3. What would you suggest. Thanks
The statements are encountered from top to bottom and, within each record, stop as soon as a condition is met. Thus the lower conditions don't override conditions that preceded them.
My guess is that your spelling/typing is off on some of the entries. e.g., are the characters in the text "HCl" or "HC1" or "HCL"?
Art, CEO, AnalystFinder.com
Art it looks like it is grabbing the pressure drop line and overwriting the value desired in the 15% HCL Acid.
The reason i added Acid is because on some of the other entries the 15% HCL Acid is just listed as Acid
Post all of your code and your data file (or at least a subset of it .. including at least two sets of records).
THis is the code i am using. I just continued with what you posted earlier
data work.comments;
set WORK.TestData;
length _var $80;
_i=1;
do while (scan(comments,_i,'0A'x) ne '');
_var=strip(translate(strip(scan(comments,_i,'0A'x)),'','09'x));
if substr(_var,1,12) eq 'Stage Number' then Stage_Number=substr(_var,13);
else if substr(_var,1,10) eq 'Stage Name' then Stage_Number=substr(_var,12);
else if substr(_var,1,5) eq 'Stage' then Stage_Number=substr(_var,6);
else if substr(_var,1,19) eq 'Adams-Tipton 2H Stg' then Stage_Number=substr(_var,20);
else if substr(_var,1,19) eq 'Adams Tipton 3H Stg' then Stage_Number=substr(_var,20);
else if substr(_var,1,4) eq 'Date' then Date=substr(_var,5);
else if substr(_var,1,10) eq 'Start Time' then Start_Time=substr(_var,11);
else if substr(_var,1,8) eq 'End Time' then End_Time=substr(_var,9);
else if substr(_var,1,14) eq 'Open WHP (psi)' then Initial_Wellhead_Pressure=substr(_var,16);
else if substr(_var,1,8) eq 'Open WHP' then Initial_Wellhead_Pressure=substr(_var,10);
else if substr(_var,1,17) eq 'Wellhead Pressure' then Initial_Wellhead_Pressure=substr(_var,19);
else if substr(_var,1,9) eq 'Pump Time' then Pump_Time=substr(_var,10);
else if substr(_var,1,25) eq 'Initial Wellhead Pressure' then Initial_Wellhead_Pressure=substr(_var,26);
else if substr(_var,1,21) eq 'Max Treating Pressure' then Max_Treating_Pressure=substr(_var,22);
else if substr(_var,1,18) eq 'Max Pressure (psi)' then Max_Treating_Pressure=substr(_var,20);
else if substr(_var,1,12) eq 'Max Pressure' then Max_Treating_Pressure=substr(_var,14);
else if substr(_var,1,7) eq 'Max PSI' then Max_Treating_Pressure=substr(_var,9);
else if substr(_var,1,21) eq 'Avg Treating Pressure' then Avg_Treating_Pressure=substr(_var,22);
else if substr(_var,1,18) eq 'Avg Pressure (psi)' then Avg_Treating_Pressure=substr(_var,20);
else if substr(_var,1,12) eq 'Avg Pressure' then Avg_Treating_Pressure=substr(_var,14);
else if substr(_var,1,7) eq 'Avg PSI' then Avg_Treating_Pressure=substr(_var,9);
else if substr(_var,1,15) eq 'Max Slurry Rate' then Max_Slurry_Rate=substr(_var,16);
else if substr(_var,1,14) eq 'Max Rate (bpm)' then Max_Slurry_Rate=substr(_var,16);
else if substr(_var,1,8) eq 'Max Rate' then Max_Slurry_Rate=substr(_var,10);
else if substr(_var,1,15) eq 'Avg Slurry Rate' then Avg_Slurry_Rate=substr(_var,16);
else if substr(_var,1,14) eq 'Avg Rate (bpm)' then Avg_Slurry_Rate=substr(_var,16);
else if substr(_var,1,8) eq 'Avg Rate' then Avg_Slurry_Rate=substr(_var,10);
else if substr(_var,1,7) eq 'Avg HHP' then Avg_HHP=substr(_var,8);
else if substr(_var,1,12) eq '15% HCl Acid' then Acid_15_Perc=substr(_var,13);
else if substr(_var,1,7) eq '15% HCl' then Acid_15_Perc=substr(_var,8);
else if substr(_var,1,11) eq 'Acid (bbls)' then Acid_15_Perc=substr(_var,13);
else if substr(_var,1,10) eq 'Acid(bbls)' then Acid_15_Perc=substr(_var,11);
else if substr(_var,1,5) eq 'Acid:' then Acid_15_Perc=substr(_var,6);
*else if substr(_var,1,4) eq 'Acid' then Acid_15_Perc=substr(_var,5);
else if substr(_var,1,11) eq 'FR-66 Water' then FR_66_Water=substr(_var,12);
else if substr(_var,1,23) eq '20# Waterfrac G (14 cP)' then _20_Pound_Waterfrac_G_14_cP=substr(_var,24);
else if substr(_var,1,21) eq '25# Waterfrac (19 cP)' then _25_Pound_Waterfrac_19_cP=substr(_var,22);
else if substr(_var,1,15) eq '25# WaterFrac G' then _25_Pound_Waterfrac_19_cP=substr(_var,16);
*else if substr(_var,1,15) eq '25# Waterfrac G' then _25_Pound_Waterfrac_19_cP=substr(_var,16);
else if substr(_var,1,23) eq '25# Waterfrac G (19 cP)' then _25_Pound_Waterfrac_19_cP=substr(_var,24);
else if substr(_var,1,19) eq '25# Hybor G (19 cP)' then _25_Pound_Hybor_G=substr(_var,20);
else if substr(_var,1,11) eq '25# Hybor G' then _25_Pound_Hybor_G=substr(_var,12);
*else if substr(_var,1,12) eq 'Clean Volume' then Clean_Volume=substr(_var,13);
else if substr(_var,1,13) eq 'Slurry Volume' then Slurry_Volume=substr(_var,14);
else if substr(_var,1,10) eq 'Slurry Vol' then Slurry_Volume=substr(_var,11);
else if substr(_var,1,18) eq 'Max Proppant Conc.' then Max_Proppant_Conc=substr(_var,19);
else if substr(_var,1,26) eq 'Max Proppant Concentration' then Max_Proppant_Conc=substr(_var,28);
else if substr(_var,1,13) eq 'Max Prop Conc' then Max_Proppant_Conc=substr(_var,15);
else if substr(_var,1,21) eq 'BH Max Proppant Conc.' then BH_Max_Proppant_Conc=substr(_var,22);
else if substr(_var,1,19) eq '30/50 Premium White' then _30_50_Premium_White=substr(_var,20);
else if substr(_var,1,11) eq '30/50 (lbs)' then _30_50_Premium_White=substr(_var,13);
else if substr(_var,1,10) eq '30/50(lbs)' then _30_50_Premium_White=substr(_var,11);
*else if substr(_var,1,6) eq '30/50 ' then _30_50_Premium_White=substr(_var,7);
else if substr(_var,1,13) eq '30/50 Premium' then _30_50_Premium_White=substr(_var,14);
else if substr(_var,1,10) eq '30/50 Sand' then _30_50_Premium_White=substr(_var,11);
else if substr(_var,1,12) eq '30/50 White' then _30_50_Premium_White=substr(_var,13);
else if substr(_var,1,11) eq '30/50 White' then _30_50_Premium_White=substr(_var,12);
else if substr(_var,1,11) eq '30/50 white' then _30_50_Premium_White=substr(_var,12);
else if substr(_var,1,10) eq '30/50 CRC' then _30_50_CRC=substr(_var,11);
else if substr(_var,1,16) eq '30/50 CRC Resin' then _30_50_CRC=substr(_var,17);
else if substr(_var,1,9) eq '30/50 CRC' then _30_50_CRC=substr(_var,10,18);
else if substr(_var,1,5) eq '30/70' then _30_70_White=substr(_var,6);
else if substr(_var,1,11) eq '30/70 White' then _30_70_White=substr(_var,12);
else if substr(_var,1,5) eq '30/35 CRC CRC Resin' then _30_35_CRC_CRC_Resin=substr(_var,6);
else if substr(_var,1,17) eq '20/40 White (lbs)' then _20_40_Premium_White=substr(_var,19);
else if substr(_var,1,12) eq '20/40 White' then _20_40_Premium_White=substr(_var,13);
else if substr(_var,1,19) eq '20/40 Premium White' then _20_40_Premium_White=substr(_var,20);
else if substr(_var,1,11) eq '20/40 White' then _20_40_Premium_White=substr(_var,12);
else if substr(_var,1,11) eq '20/40 white' then _20_40_Premium_White=substr(_var,12);
else if substr(_var,1,19) eq '20/40 Premium White' then _20_40_Premium_White=substr(_var,20);
else if substr(_var,1,11) eq '20/40(lbs)' then _20_40_Premium_White=substr(_var,12);
else if substr(_var,1,7) eq '20/40 ' then _20_40_Premium_White=substr(_var,8);
else if substr(_var,1,10) eq '20/40 Sand' then _20_40_Premium_White=substr(_var,12);
else if substr(_var,1,10) eq '20/40 CRC' then _20_40_CRC=substr(_var,11);
else if substr(_var,1,15) eq '20/40 CRC (lbs)' then _20_40_CRC=substr(_var,17);
else if substr(_var,1,16) eq '20/40 CRC Resin' then _20_40_CRC=substr(_var,17);
else if substr(_var,1,15) eq '20/40 CRC Resin' then _20_40_CRC=substr(_var,16);
else if substr(_var,1,9) eq '20/40 SLC' then _20_40_CRC=substr(_var,10);
else if substr(_var,1,3) eq 'CRC' then _20_40_CRC=substr(_var,5);
else if substr(_var,1,8) eq '100 Mesh' then _100_MESH=substr(_var,9);
else if substr(_var,1,8) eq '100 mesh' then _100_MESH=substr(_var,9);
else if substr(_var,1,30) eq 'Sandwedge ABC Plus Mass Coated' then Sandwedge_ABC_Plus_Mass_Coated=substr(_var,31);
else if substr(_var,1,29) eq 'Sandwedge ABC Max Mass Coated' then Sandwedge_ABC_Max_Mass_Coated=substr(_var,30);
else if substr(_var,1,20) eq 'Proppant Mass Pumped' then Proppant_Mass_Pumped=substr(_var,21);
else if substr(_var,1,20) eq 'Total Proppant (lbs)' then Proppant_Mass_Pumped=substr(_var,22);
else if substr(_var,1,19) eq 'Total Proppant(lbs)' then Proppant_Mass_Pumped=substr(_var,20);
else if substr(_var,1,10) eq 'Total Prop' then Proppant_Mass_Pumped=substr(_var,12);
else if substr(_var,1,21) eq 'Proppant Mass Counted' then Proppant_Mass_Counted=substr(_var,22);
else if substr(_var,1,18) eq 'Breakdown Pressure' then Breakdown_Pressure=substr(_var,19);
else if substr(_var,1,14) eq 'Breakdown(psi)' then Breakdown_Pressure=substr(_var,15);
else if substr(_var,1,16) eq 'Rate @ Breakdown' then Rate_At_Breakdown=substr(_var,17);
else if substr(_var,1,12) eq 'Ball on Seat' then Ball_on_Seat=substr(_var,13);
else if substr(_var,1,18) eq 'Acid Pressure Drop' then Acid_Pressure_Drop=substr(_var,19);
else if substr(_var,1,15) eq 'STP Before Acid' then STP_Before_Acid=substr(_var,16);
else if substr(_var,1,14) eq 'STP After Acid' then STP_After_Acid=substr(_var,15);
else if substr(_var,1,10) eq 'Final ISIP' then ISIP=substr(_var,11);
else if substr(_var,1,11) eq 'Finial ISIP' then ISIP=substr(_var,13);
else if substr(_var,1,10) eq 'ISIP (psi)' then ISIP=substr(_var,12);
else if substr(_var,1,5) eq 'ISIP:' then ISIP=substr(_var,6);
else if substr(_var,1,4) eq 'ISIP' then ISIP=substr(_var,5);
else if substr(_var,1,8) eq 'Final FG' then Final_FG=substr(_var,9);
else if substr(_var,1,13) eq 'Frac Gradient' then Final_FG=substr(_var,15);
else if substr(_var,1,9) eq 'Frac Grad' then Final_FG=substr(_var,10);
else if substr(_var,1,14) eq '5 Min. Shut-In' then _5_Min_Shut_In=substr(_var,15);
else if substr(_var,1,13) eq '5 min shut in' then _5_Min_Shut_In=substr(_var,15);
else if substr(_var,1,16) eq '5 Minute Shut-In' then _5_Min_Shut_In=substr(_var,17);
else if substr(_var,1,16) eq '5 Minute Shut In' then _5_Min_Shut_In=substr(_var,17);
else if substr(_var,1,6) eq '5 Min.' then _5_Min_Shut_In=substr(_var,8);
else if substr(_var,1,5) eq '5 min' then _5_Min_Shut_In=substr(_var,7);
else if substr(_var,1,13) eq '5 Min Shut-In' then _5_Min_Shut_In=substr(_var,14);
else if substr(_var,1,10) eq 'Pad Volume' then Pad_Volume=substr(_var,11);
else if substr(_var,1,10) eq 'Pad (bbls)' then Pad_Volume=substr(_var,12);
else if substr(_var,1,9) eq 'Pad(bbls)' then Pad_Volume=substr(_var,11);
else if substr(_var,1,4) eq 'Pad:' then Pad_Volume=substr(_var,5);
else if substr(_var,1,3) eq 'PAD' then Pad_Volume=substr(_var,4);
else if substr(_var,1,13) eq 'XL Pad Volume' then XL_Pad_Volume=substr(_var,14);
else if substr(_var,1,20) eq 'Clean Volume for PLF' then Clean_Volume_for_PLF=substr(_var,21);
else if substr(_var,1,18) eq 'Total Clean Volume' then Clean_Volume_for_PLF=substr(_var,20);
else if substr(_var,1,35) eq 'Total Clean Fluid to Recover (bbls)' then Clean_Volume_for_PLF=substr(_var,37);
else if substr(_var,1,22) eq 'Total Clean to Recover' then Clean_Volume_for_PLF=substr(_var,24);
else if substr(_var,1,11) eq 'Clean Fluid' then Clean_Volume_for_PLF=substr(_var,12);
else if substr(_var,1,15) eq 'Proppant Pumped' then Proppant_Pumped=substr(_var,16);
else if substr(_var,1,17) eq 'Proppant % Pumped' then Proppant_Pumped=substr(_var,18);
else if substr(_var,1,15) eq 'Proppant Placed' then Proppant_Pumped=substr(_var,16);
else if substr(_var,1,15) eq 'based on design' then Proppant_Pumped=substr(_var,17);
else if substr(_var,1,6) eq '% sand' then Proppant_Pumped=substr(_var,7);
_i+1;
end;
run;
I then use a second program to remove alpha characters and combine and convert into numeric
data work.StimInt_Comments;
set work.comments;
Wellhead_Pressure = input (compress(Initial_Wellhead_Pressure, ',psi(-) '), 4.);
drop Initial_Wellhead_Pressure;
rename Wellhead_Pressure=Initial_Wellhead_Pressure;
Max_TreatingPressure= input (compress(Max_Treating_Pressure, ',psi(-) '), 4.);
drop Max_Treating_Pressure;
rename Max_TreatingPressure=Max_Treating_Pressure;
Avg_TreatingPressure= input (compress(Avg_Treating_Pressure, ',psi(-) '), 4.);
drop Avg_Treating_Pressure;
rename Avg_TreatingPressure=Avg_Treating_Pressure;
If length(compress(Max_Slurry_Rate,', bpm psi hp ()+- ')) le 4 then
Max_SlurryRate_bpm= input (compress(Max_Slurry_Rate,', bpm psi hp ()+- '), 2.1); else
Max_SlurryRate_bpm= input (compress(Max_Slurry_Rate,', bpm psi hp ()+- '), 4.1);
drop Max_Slurry_Rate;
*Max_SlurryRate_bpm_length=length (compress(Max_Slurry_Rate,', bpm psi hp ()+- '));
If length(compress(Avg_Slurry_Rate,', bpm psi hp ()+- ')) le 3 then
Avg_SlurryRate_bpm= input (compress(Avg_Slurry_Rate,', bpm psi hp ()+- '), 2.);else
Avg_SlurryRate_bpm= input (compress(Avg_Slurry_Rate,', bpm psi hp ()+- '), 4.);
*drop Avg_Slurry_Rate;
Avg_Slurry_Rate_length=length (compress(Avg_Slurry_Rate,', bpm psi hp ()+- '));
AvgHHP= input(compress(Avg_HHP,',+-', 'a'), 8.);
drop Avg_HHP;
rename AvgHHP=Avg_HHP;
Max_Proppant_Conc_lb_gal= input(compress(Max_Proppant_Conc,'/,+-', 'a'), 10.2);
drop Max_Proppant_Conc;
If length(_30_50_Premium_White) le 8 then
_30_50_PremiumWhite_lbs= input(compress(Left(_30_50_Premium_White),',+-', 'a'), 4.); else
_30_50_PremiumWhite_lbs= input(compress(Left(_30_50_Premium_White),',+-', 'a'), 6.);
drop _30_50_Premium_White;
_20_40_Premium_White_lb= input(compress(Left(_20_40_Premium_White),',+-', 'a'), 6.);
drop _20_40_Premium_White;
*rename _20_40_PremiumWhite=_20_40_Premium_White;
Breakdown_Pressure_psi= input(compress(Left(Breakdown_Pressure),',+-', 'a'), 5.);
drop Breakdown_Pressure;
If length(compress(Left(ISIP),',+-', 'a')) le 5 then
ISIP_psi= input(compress(Left(ISIP),',+-', 'a'), 4.);else
ISIP_psi= input(compress(Left(ISIP),',+-', 'a'), 5.);
*drop ISIP;
ISIP_psi_length=length(compress(Left(ISIP),',+-', 'a'));
If length(compress(Left(_5_Min_Shut_In),',+- ', 'a')) le 5 then
_5_Min_Shut_In_psi= input(compress(Left(_5_Min_Shut_In),',+- ', 'a'), 4.);else
_5_Min_Shut_In_psi= input(compress(Left(_5_Min_Shut_In),',+- ', 'a'), 5.);
*drop _5_Min_Shut_In;
_5_Min_Shut_In_length=length(compress(Left(_5_Min_Shut_In),',+- ', 'a'));
Rate_At_Breakdown_bpm= input(compress(Left(Rate_At_Breakdown),',+-', 'a'), 4.);
drop Rate_At_Breakdown;
STP_Before_Acid_psi= input(compress(Left(STP_Before_Acid),',+-', 'a'), 4.);
drop STP_Before_Acid;
STP_Before_Acid_psi= input(compress(Left(STP_Before_Acid),',+-', 'a'), 4.);
drop STP_Before_Acid;
STP_After_Acid_psi= input(compress(Left(STP_After_Acid),',+-', 'a'), 4.);
drop STP_After_Acid;
Acid_Pressure_Drop_psi= input(compress(Left(Acid_Pressure_Drop),',+-', 'a'), 4.);
drop Acid_Pressure_Drop;
If length(compress(Proppant_Mass_Pumped,', bpm psi hp ()+- ')) le 7 then
Proppant_Mass_Pumped_lb= input(compress(Left(Proppant_Mass_Pumped),',+-', 'a'), 6.);else
Proppant_Mass_Pumped_lb= input(compress(Left(Proppant_Mass_Pumped),',+-', 'a'), 7.);
*drop Proppant_Mass_Pumped;
Proppant_Mass_Pumped_lb_length=length (compress(Proppant_Mass_Pumped,', bpm psi hp ()+- '));
Proppant_Mass_Counted_lb= input(compress(Left(Proppant_Mass_Counted),',+-', 'a'), 7.);
drop Proppant_Mass_Counted;
IF index(Pad_Volume, 'gal') ne 0 then
Pad_Volume_Bbls = input(compress(Left(substr(Pad_Volume, index(Pad_Volume, 'gal'))),',+-', 'a'), 5.); else
If length(compress(Left(Pad_Volume),',+- ', 'a'))le 4 then
Pad_Volume_Bbls= input(compress(Left(Pad_Volume),',+- ', 'a'), 3.);else
Pad_Volume_Bbls= input(compress(Left(Pad_Volume),',+-', 'a'), 4.);
drop Pad_Volume;
IF index(XL_Pad_Volume, 'gal') ne 0 then
XL_Pad_Volume_Bbls = input(compress(Left(substr(XL_Pad_Volume, index(XL_Pad_Volume, 'gal'))),',+-', 'a'), 5.); else
If length(compress(Left(Pad_Volume),',+- ', 'a'))le 4 then
XL_Pad_Volume_Bbls= input(compress(Left(XL_Pad_Volume),',+- ', 'a'), 3.);else
XL_Pad_Volume_Bbls= input(compress(Left(XL_Pad_Volume),',+-', 'a'), 4.);
drop XL_Pad_Volume;
XL_Pad_Volume_Bbls_length = length(compress(Left(XL_Pad_Volume),',+- ', 'a'));
IF index(Clean_Volume_for_PLF, 'gal') ne 0 then
Clean_Volume_for_PLF_Bbls = input(compress(Left(substr(Clean_Volume_for_PLF, index(Clean_Volume_for_PLF, 'gal'))),',+-', 'a'), 5.); else
Clean_Volume_for_PLF_Bbls= input(compress(Left(Clean_Volume_for_PLF),',+-', 'a'), 4.);
*drop Clean_Volume_for_PLF;
Clean_Volume_for_PLF_length = length(compress(Left(Clean_Volume_for_PLF),',+- ', 'a'));
IF index(Acid_15_Perc, 'gal') ne 0 then
Acid_15_Perc_Bbl = input(compress(Left(substr(Acid_15_Perc, index(Acid_15_Perc, 'gal'))),',+-', 'a'), 3.); else
if length (compress(Left(Acid_15_Perc),',+- ', 'a')) le 3 then
Acid_15_Perc_Bbl= input(compress(Left(Acid_15_Perc),',+-', 'a'), 2.); else
Acid_15_Perc_Bbl= input(compress(Left(Acid_15_Perc),',+-', 'a'), 3.);
drop Acid_15_Perc;
*Acid_15_Perc_length = length(compress(Left(Acid_15_Perc),',+- ', 'a'));
IF index(Slurry_Volume, 'gal') ne 0 then
Slurry_Volume_Bbl = input(compress(Left(substr(Slurry_Volume, index(Slurry_Volume, 'gal'))),',+-', 'a'), 6.); else
if length (compress(Left(Slurry_Volume),',+- ', 'a')) le 5 then
Slurry_Volume_Bbl = input(compress(Left(Slurry_Volume),',+- ', 'a'), 4.);else
Slurry_Volume_Bbl = input(compress(Left(Slurry_Volume),',+-', 'a'), 6.);
*drop Slurry_Volume;
Slurry_Volume_length = length(compress(Left(Slurry_Volume),',+- ', 'a'));
IF index(FR_66_Water, 'gal') ne 0 then
FR_66_Water_Bbls = input(compress(Left(substr(FR_66_Water, index(FR_66_Water, 'gal'))),',+-', 'a'), 4.);else
FR_66_Water_Bbls = input(compress(Left(FR_66_Water),',+-', 'a'), 4.);
drop FR_66_Water;
IF index(_25_Pound_Hybor_G, 'gal') ne 0 then
_25_Pound_Hybor_G_Bbls = input(compress(Left(substr(_25_Pound_Hybor_G, index(_25_Pound_Hybor_G, 'gal'))),',+-', 'a'), 5.);else
_25_Pound_Hybor_G_Bbls = input(compress(Left(_25_Pound_Hybor_G),',+-', 'a'), 5.);
drop _25_Pound_Hybor_G;
IF index(_25_Pound_Waterfrac_19_cP, 'gal') ne 0 then
_25_Pound_Waterfrac_19_cP_Bbls = input(compress(Left(substr(_25_Pound_Waterfrac_19_cP, index(_25_Pound_Waterfrac_19_cP, 'gal'))),',+-', 'a'), 5.);else
_25_Pound_Waterfrac_19_cP_Bbls = input(compress(Left(_25_Pound_Waterfrac_19_cP),',+-', 'a'), 5.);
drop _25_Pound_Waterfrac_19_cP;
IF index(_20_Pound_Waterfrac_G_14_cP, 'gal') ne 0 then
_20_Pound_Waterfrac_G_14_cP_Bbls = input(compress(Left(substr(_20_Pound_Waterfrac_G_14_cP, index(_20_Pound_Waterfrac_G_14_cP, 'gal'))),',+-', 'a'), 5.);
else
_20_Pound_Waterfrac_G_14_cP_Bbls = input(compress(Left(_20_Pound_Waterfrac_G_14_cP),',+-', 'a'), 5.);
drop _20_Pound_Waterfrac_G_14_cP;
if length (compress(Left(Final_FG),',+- ', 'a')) le 9 then
Final_FG_psi_ft= input(compress(Left(Final_FG),',+- ', 'a'), 4.3);else
Final_FG_psi_ft= input(compress(Left(Final_FG),',+- ', 'a'), 5.3);
drop Final_FG;
Final_FG_length=length (compress(Final_FG,', bpm psi hp ()+- '));
BH_Max_Proppant_Conc_lb_gal= input(compress(Left(BH_Max_Proppant_Conc),',+- ', 'a'), 4.3);
drop BH_Max_Proppant_Conc;
Ball_on_Seat_Bbls= input(compress(Left(Ball_on_Seat),',+- ', 'a'), 3.);
drop Ball_on_Seat;
IF index(_30_50_CRC, 'E') ne 0 then
_30_50_CRC_lbs = input(compress(Left(substr(_30_50_CRC,11)),',+-', 'a'), 5.);
else
_30_50_CRC_lbs= input(compress(Left(substr(_30_50_CRC,1,8)),',+-', 'a'), 8.);
drop _30_50_CRC;
_20_40_CRC_lbs= input(compress(Left(_20_40_CRC),',+- ', 'a'), 5.);
drop _20_40_CRC;
run;
I have attached the data set that i was working with
Now I remember when you initially posted your question AND that you did include an example dataset that included two sets of records.
The two sets are different! API_IPO_UWI='4201334737' has items and values separated by tab characters for all records except for iternval_num=5. On the other hand, API_IPO_UWI='4225533069' DOESN'T have the items and values separated by tab characters.
The following would only work for the records that have the items and values separated by tab characters and deletes all of the other records:
proc import datafile="c:\art\work_testdata.xlsx" out=have replace dbms=excel;
getnames=yes;
mixed=yes;
sheet='TESTDATA';
usedate=yes;
scantime=yes;
run;
data need;
set have;
length var $80;
if API_IPO_UWI='4201334737' and interval_num ne 5;
_i=1;
do while (scan(comments,_i,'0A'x) ne '');
var=scan(comments,_i,'0A'x);
item=scan(var,1,'09'x);
value=scan(var,2,'09'x);
output;
_i+1;
end;
run;
proc transpose data=need out=want;
by interval_num;
var value;
id item;
run;
Even with the above, there are a number of differences between the first record and the other records in the set.
Art, CEO, AnalystFinder.com
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.