DATA Step, Macro, Functions and more

Generating first and second string output of criteria

Accepted Solution Solved
Reply
Contributor
Posts: 39
Accepted Solution

Generating first and second string output of criteria

[ Edited ]

I have the following dataset:

 

tech_cmnt
AIR HOSE UNCOUPLED BETWEEN CARS TTGX 159920 AND TTGX604561. RECOUPLED TOGETHER.
There was a gash in the trainline between cars TTGX993160 AND TTGX854014. HOSE REPLACED.  SGT/OMA.
AIR HOSE SEPERATION BETWEEN CARS XTTX137370 AND TTHX92394. AIR HOSES COUPLED BACK TOGETHER AND TRAIN IS BACK MOVING AT THIS TIME. CAD/OMA.
AIR HOSE ON A-END OF CAR TTGX 158384 WAS NOT TIGHT IN THE SCREW ON FITTING. AIR HOSE TIGHTENED AND TRAIN IS BACK ON THE MOVE AT THIS TIME. CAD/OMA.
 WRONG END DRAWBAR ON TTGX604269 ON A-END. MOP AND CARMAN ON SITE AT THIS TIME.  TTGX604269 AND THE 4 CARS BEHIND IT WERE SETOUT AT NOLTE%2C TX.
BROKEN KNUCKLE ON TTGX986512 ON THE B-END. REKNUCKLED IN TRAIN. TRAIN BACK ON THE MOVE WITH NO OTHER ISSUES AT THIS TIME. CAD/OMA.

 

 

I'm trying to extract the 6 digit numbers plus the 4 initials that preceed them to have the following result:

 

sashelp2.JPG

 

As you can see, some of the numbers have no space between them and the characters before it, and some have one space.  I'm struggling with how to do this, and if there are multiple occurrences of what I'm looking for, how to extract the second occurence and output that if it's not the same as the first (i.e. car1 and car2 results). 

 

I've tried scan, substr, anydigit (<- thought that and then substr might be the one, but i cannot figure out how to get car2 to populate).

 

This list is not limited either to car1 or car2 having the 4x character initials.  These are all of the possibilities:  ABOX
ATTX
BTTX
CBOX
CTTX
DTTX
ETTX
FBOX
FTTX
GNDX
GNTX
GONX
GTTX
HTTX
ITTX
JTTX
LTTX
MTTX
OTTX
PTTX
QTTX
RBOX
RTTX
STTX
TBOX
TFWX
TOAX
TOBX
TOCX
TOFX
TPDX
TTAX
TTCX
TTDX
TTEX
TTFX
TTGX
TTHX
TTIX
TTJX
TTKX
TTLX
TTMX
TTPX
TTRX
TTUX
TTWX
TTXX
TTYX
TTZX
VTTX
XTTX

 

Thanks for your insight.


Accepted Solutions
Solution
‎05-31-2016 04:47 PM
Super User
Posts: 9,671

Re: Generating first and second string output of criteria

You are almost get in there.



data have;
infile cards truncover;
input a $200.;
cards;
AIR HOSE UNCOUPLED BETWEEN CARS TTGX 159920 AND TTGX604561. RECOUPLED TOGETHER.
There was a gash in the trainline between cars TTGX993160 AND TTGX854014. HOSE REPLACED.  SGT/OMA.
AIR HOSE SEPERATION BETWEEN CARS XTTX137370 AND TTHX92394. AIR HOSES COUPLED BACK TOGETHER AND TRAIN IS BACK MOVING AT THIS TIME. CAD/OMA.
AIR HOSE ON A-END OF CAR TTGX 158384 WAS NOT TIGHT IN THE SCREW ON FITTING. AIR HOSE TIGHTENED AND TRAIN IS BACK ON THE MOVE AT THIS TIME. CAD/OMA.
 WRONG END DRAWBAR ON TTGX604269 ON A-END. MOP AND CARMAN ON SITE AT THIS TIME.  TTGX604269 AND THE 4 CARS BEHIND IT WERE SETOUT AT NOLTE%2C TX.
BROKEN KNUCKLE ON TTGX986512 ON THE B-END. REKNUCKLED IN TRAIN. TRAIN BACK ON THE MOVE WITH NO OTHER ISSUES AT THIS TIME. CAD/OMA.
;
run;
data want;
set have;
n+1;
pid=prxparse('/\b[A-Z]{4}\s*\d+\b/io');
s=1;
e=length(a);
call prxnext(pid,s,e,a,p,l);
do while(p>0);
  found=substr(a,p,l);
  output;
  call prxnext(pid,s,e,a,p,l);
end;
drop s e p l pid;
run;
proc transpose data=want out=final_want prefix=car;
by n a;
var found;
run;

View solution in original post


All Replies
Super User
Posts: 17,766

Re: Generating first and second string output of criteria

Post data as text, not as pictures please, otherwise we have to type out your data manually. 

 

You're going to need PRX 

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002288677.htm

 

Contributor
Posts: 39

Re: Generating first and second string output of criteria

[ Edited ]

Reeza,

 

This was suggested to me as a potential solution to a similar problem:

 

Run;
data want;
set have;
pid=prxparse('/ETTX|TTGX|TTPX|DTTX/o');
s=1;
e=length(a);
call prxnext(pid,s,e,a,p,l);
do while(p>0);
  found=substr(a,p,l);
  output;
  call prxnext(pid,s,e,a,p,l);
end;
drop s e p l pid;
run;

 

I tried that, but I was getting errors when I added in the full list of 4-character initials in the pid=prxparse('/ETTX|TTGX|TTPX|DTTX/o') part of the code.  It said there were too many errors generated when trying it.

Super User
Posts: 10,474

Re: Generating first and second string output of criteria

Here's something I won't claim is elegant but seems to work for the example data:

data codes;
   input code $ 1-4;
datalines;
ABOX
ATTX
BTTX
CBOX
CTTX
DTTX
ETTX
FBOX
FTTX
GNDX
GNTX
GONX
GTTX
HTTX
ITTX
JTTX
LTTX
MTTX
OTTX
PTTX
QTTX
RBOX
RTTX
STTX
TBOX
TFWX
TOAX
TOBX
TOCX
TOFX
TPDX
TTAX
TTCX
TTDX
TTEX
TTFX
TTGX
TTHX
TTIX
TTJX
TTKX
TTLX
TTMX
TTPX
TTRX
TTUX
TTWX
TTXX
TTYX
TTZX
VTTX
XTTX
;
run;

proc sql noprint;
   select quote(code) into : codelist separated by ','
   from codes;
quit;

data want;
   set have;
   array c (&codecount) $ 4 _temporary_ (&codelist);
   length temp car1 car2 $ 12;
   rc=prxparse("/\d+/");
   do i= 1 to dim(c);
      pos = index(tech_cmnt,c[i]);
      if pos>0  then do;
         temp=substr(tech_cmnt,pos,12);
         /* trim every thing after last digit in temp*/
         call prxsubstr(rc,temp,p,l);
         temp = substr(temp,1,p+l-1);
      end;
      if missing(car1) and not missing(temp) then car1=temp;
      else if temp ne car1 then car2=temp;
      If find(tech_cmnt,c[i],pos+12)>0 then do;
         temp=substr(tech_cmnt,find(tech_cmnt,c[i],pos+12),12);
         call prxsubstr(rc,temp,p,l);
         temp = substr(temp,1,p+l-1);
         car2=temp;
      end;
   end;
   drop i rc p l pos temp ;
run;

One minor advantage is you could build the list of codes as a Select distinct variable type query from another set.

 

Might want to add a bit to check on the maximum length of the codes to replace the 4 in the array definition.

The PRX stuff in this one is only being used to clean up characters after the last digit. It may have an issue substring the phrase if the very last bit of the phrase is a car and is one of the shorter ones as written

Solution
‎05-31-2016 04:47 PM
Super User
Posts: 9,671

Re: Generating first and second string output of criteria

You are almost get in there.



data have;
infile cards truncover;
input a $200.;
cards;
AIR HOSE UNCOUPLED BETWEEN CARS TTGX 159920 AND TTGX604561. RECOUPLED TOGETHER.
There was a gash in the trainline between cars TTGX993160 AND TTGX854014. HOSE REPLACED.  SGT/OMA.
AIR HOSE SEPERATION BETWEEN CARS XTTX137370 AND TTHX92394. AIR HOSES COUPLED BACK TOGETHER AND TRAIN IS BACK MOVING AT THIS TIME. CAD/OMA.
AIR HOSE ON A-END OF CAR TTGX 158384 WAS NOT TIGHT IN THE SCREW ON FITTING. AIR HOSE TIGHTENED AND TRAIN IS BACK ON THE MOVE AT THIS TIME. CAD/OMA.
 WRONG END DRAWBAR ON TTGX604269 ON A-END. MOP AND CARMAN ON SITE AT THIS TIME.  TTGX604269 AND THE 4 CARS BEHIND IT WERE SETOUT AT NOLTE%2C TX.
BROKEN KNUCKLE ON TTGX986512 ON THE B-END. REKNUCKLED IN TRAIN. TRAIN BACK ON THE MOVE WITH NO OTHER ISSUES AT THIS TIME. CAD/OMA.
;
run;
data want;
set have;
n+1;
pid=prxparse('/\b[A-Z]{4}\s*\d+\b/io');
s=1;
e=length(a);
call prxnext(pid,s,e,a,p,l);
do while(p>0);
  found=substr(a,p,l);
  output;
  call prxnext(pid,s,e,a,p,l);
end;
drop s e p l pid;
run;
proc transpose data=want out=final_want prefix=car;
by n a;
var found;
run;

Contributor
Posts: 39

Re: Generating first and second string output of criteria

Ahhh thank you so much, this worked great!  Now I need to learn how to use those functions to truly understand.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 271 views
  • 1 like
  • 4 in conversation