BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
laneylaners
Obsidian | Level 7

I have the following dataset:

 

tech_cmnt
AIR HOSE UNCOUPLED BETWEEN CARS TTGX 159920 AND TTGX604561. RECOUPLED TOGETHER.
There was a gash in the trainline between cars TTGX993160 AND TTGX854014. HOSE REPLACED.  SGT/OMA.
AIR HOSE SEPERATION BETWEEN CARS XTTX137370 AND TTHX92394. AIR HOSES COUPLED BACK TOGETHER AND TRAIN IS BACK MOVING AT THIS TIME. CAD/OMA.
AIR HOSE ON A-END OF CAR TTGX 158384 WAS NOT TIGHT IN THE SCREW ON FITTING. AIR HOSE TIGHTENED AND TRAIN IS BACK ON THE MOVE AT THIS TIME. CAD/OMA.
 WRONG END DRAWBAR ON TTGX604269 ON A-END. MOP AND CARMAN ON SITE AT THIS TIME.  TTGX604269 AND THE 4 CARS BEHIND IT WERE SETOUT AT NOLTE%2C TX.
BROKEN KNUCKLE ON TTGX986512 ON THE B-END. REKNUCKLED IN TRAIN. TRAIN BACK ON THE MOVE WITH NO OTHER ISSUES AT THIS TIME. CAD/OMA.

 

 

I'm trying to extract the 6 digit numbers plus the 4 initials that preceed them to have the following result:

 

sashelp2.JPG

 

As you can see, some of the numbers have no space between them and the characters before it, and some have one space.  I'm struggling with how to do this, and if there are multiple occurrences of what I'm looking for, how to extract the second occurence and output that if it's not the same as the first (i.e. car1 and car2 results). 

 

I've tried scan, substr, anydigit (<- thought that and then substr might be the one, but i cannot figure out how to get car2 to populate).

 

This list is not limited either to car1 or car2 having the 4x character initials.  These are all of the possibilities:  ABOX
ATTX
BTTX
CBOX
CTTX
DTTX
ETTX
FBOX
FTTX
GNDX
GNTX
GONX
GTTX
HTTX
ITTX
JTTX
LTTX
MTTX
OTTX
PTTX
QTTX
RBOX
RTTX
STTX
TBOX
TFWX
TOAX
TOBX
TOCX
TOFX
TPDX
TTAX
TTCX
TTDX
TTEX
TTFX
TTGX
TTHX
TTIX
TTJX
TTKX
TTLX
TTMX
TTPX
TTRX
TTUX
TTWX
TTXX
TTYX
TTZX
VTTX
XTTX

 

Thanks for your insight.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
You are almost get in there.



data have;
infile cards truncover;
input a $200.;
cards;
AIR HOSE UNCOUPLED BETWEEN CARS TTGX 159920 AND TTGX604561. RECOUPLED TOGETHER.
There was a gash in the trainline between cars TTGX993160 AND TTGX854014. HOSE REPLACED.  SGT/OMA.
AIR HOSE SEPERATION BETWEEN CARS XTTX137370 AND TTHX92394. AIR HOSES COUPLED BACK TOGETHER AND TRAIN IS BACK MOVING AT THIS TIME. CAD/OMA.
AIR HOSE ON A-END OF CAR TTGX 158384 WAS NOT TIGHT IN THE SCREW ON FITTING. AIR HOSE TIGHTENED AND TRAIN IS BACK ON THE MOVE AT THIS TIME. CAD/OMA.
 WRONG END DRAWBAR ON TTGX604269 ON A-END. MOP AND CARMAN ON SITE AT THIS TIME.  TTGX604269 AND THE 4 CARS BEHIND IT WERE SETOUT AT NOLTE%2C TX.
BROKEN KNUCKLE ON TTGX986512 ON THE B-END. REKNUCKLED IN TRAIN. TRAIN BACK ON THE MOVE WITH NO OTHER ISSUES AT THIS TIME. CAD/OMA.
;
run;
data want;
set have;
n+1;
pid=prxparse('/\b[A-Z]{4}\s*\d+\b/io');
s=1;
e=length(a);
call prxnext(pid,s,e,a,p,l);
do while(p>0);
  found=substr(a,p,l);
  output;
  call prxnext(pid,s,e,a,p,l);
end;
drop s e p l pid;
run;
proc transpose data=want out=final_want prefix=car;
by n a;
var found;
run;

View solution in original post

5 REPLIES 5
Reeza
Super User

Post data as text, not as pictures please, otherwise we have to type out your data manually. 

 

You're going to need PRX 

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002288677.htm

 

laneylaners
Obsidian | Level 7

Reeza,

 

This was suggested to me as a potential solution to a similar problem:

 

Run;
data want;
set have;
pid=prxparse('/ETTX|TTGX|TTPX|DTTX/o');
s=1;
e=length(a);
call prxnext(pid,s,e,a,p,l);
do while(p>0);
  found=substr(a,p,l);
  output;
  call prxnext(pid,s,e,a,p,l);
end;
drop s e p l pid;
run;

 

I tried that, but I was getting errors when I added in the full list of 4-character initials in the pid=prxparse('/ETTX|TTGX|TTPX|DTTX/o') part of the code.  It said there were too many errors generated when trying it.

ballardw
Super User

Here's something I won't claim is elegant but seems to work for the example data:

data codes;
   input code $ 1-4;
datalines;
ABOX
ATTX
BTTX
CBOX
CTTX
DTTX
ETTX
FBOX
FTTX
GNDX
GNTX
GONX
GTTX
HTTX
ITTX
JTTX
LTTX
MTTX
OTTX
PTTX
QTTX
RBOX
RTTX
STTX
TBOX
TFWX
TOAX
TOBX
TOCX
TOFX
TPDX
TTAX
TTCX
TTDX
TTEX
TTFX
TTGX
TTHX
TTIX
TTJX
TTKX
TTLX
TTMX
TTPX
TTRX
TTUX
TTWX
TTXX
TTYX
TTZX
VTTX
XTTX
;
run;

proc sql noprint;
   select quote(code) into : codelist separated by ','
   from codes;
quit;

data want;
   set have;
   array c (&codecount) $ 4 _temporary_ (&codelist);
   length temp car1 car2 $ 12;
   rc=prxparse("/\d+/");
   do i= 1 to dim(c);
      pos = index(tech_cmnt,c[i]);
      if pos>0  then do;
         temp=substr(tech_cmnt,pos,12);
         /* trim every thing after last digit in temp*/
         call prxsubstr(rc,temp,p,l);
         temp = substr(temp,1,p+l-1);
      end;
      if missing(car1) and not missing(temp) then car1=temp;
      else if temp ne car1 then car2=temp;
      If find(tech_cmnt,c[i],pos+12)>0 then do;
         temp=substr(tech_cmnt,find(tech_cmnt,c[i],pos+12),12);
         call prxsubstr(rc,temp,p,l);
         temp = substr(temp,1,p+l-1);
         car2=temp;
      end;
   end;
   drop i rc p l pos temp ;
run;

One minor advantage is you could build the list of codes as a Select distinct variable type query from another set.

 

Might want to add a bit to check on the maximum length of the codes to replace the 4 in the array definition.

The PRX stuff in this one is only being used to clean up characters after the last digit. It may have an issue substring the phrase if the very last bit of the phrase is a car and is one of the shorter ones as written

Ksharp
Super User
You are almost get in there.



data have;
infile cards truncover;
input a $200.;
cards;
AIR HOSE UNCOUPLED BETWEEN CARS TTGX 159920 AND TTGX604561. RECOUPLED TOGETHER.
There was a gash in the trainline between cars TTGX993160 AND TTGX854014. HOSE REPLACED.  SGT/OMA.
AIR HOSE SEPERATION BETWEEN CARS XTTX137370 AND TTHX92394. AIR HOSES COUPLED BACK TOGETHER AND TRAIN IS BACK MOVING AT THIS TIME. CAD/OMA.
AIR HOSE ON A-END OF CAR TTGX 158384 WAS NOT TIGHT IN THE SCREW ON FITTING. AIR HOSE TIGHTENED AND TRAIN IS BACK ON THE MOVE AT THIS TIME. CAD/OMA.
 WRONG END DRAWBAR ON TTGX604269 ON A-END. MOP AND CARMAN ON SITE AT THIS TIME.  TTGX604269 AND THE 4 CARS BEHIND IT WERE SETOUT AT NOLTE%2C TX.
BROKEN KNUCKLE ON TTGX986512 ON THE B-END. REKNUCKLED IN TRAIN. TRAIN BACK ON THE MOVE WITH NO OTHER ISSUES AT THIS TIME. CAD/OMA.
;
run;
data want;
set have;
n+1;
pid=prxparse('/\b[A-Z]{4}\s*\d+\b/io');
s=1;
e=length(a);
call prxnext(pid,s,e,a,p,l);
do while(p>0);
  found=substr(a,p,l);
  output;
  call prxnext(pid,s,e,a,p,l);
end;
drop s e p l pid;
run;
proc transpose data=want out=final_want prefix=car;
by n a;
var found;
run;

laneylaners
Obsidian | Level 7

Ahhh thank you so much, this worked great!  Now I need to learn how to use those functions to truly understand.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 986 views
  • 1 like
  • 4 in conversation