BookmarkSubscribeRSS Feed
BCNAV
Quartz | Level 8

I have a series of aviation messages that come in as text. I need to be able to use a "like" type of command.

 

At present I have 

 

(case when t1.MSG_RAW_TXT contains " CYQXZZZ" then 1 else 0 end) AS GANDER_VOICE

 

However, there can be messages in the MSG_RAW_TXT variable that are misidentified.  The ones I want ALWAYS have 6 numbers before it....for example EIAAYSYX 030800 CYQXZZZ would be a valid message to set GANDER_VOICE=1 because of the 030800 in front of CYQXZZZ. This example: EIAAYSYX CYQXZZZ would not be a valid message to set GANDER_VOICE=1.

 

I am not sure how this could be done in SAS.  A Microsoft SQL programmer at work here suggested something along the lines of:

 

Like ‘[0-9][0-9][0-9] [0-9] [0-9] [0-9] CYQXZZZ’

 

This would look for any sequence in the MSG_RAW_TXT variable that has a 6 digit number (each number is from 0-9 as stated above), then a space and then the CYQXZZZ.  In this general case we would set GANDER_VOICE=1

 

Thanks for any advice

3 REPLIES 3
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @BCNAV 

 

You can extract the wanted word with a regular expression like this:


data _null_;
	MSG_RAW_TXT = 'EIAAYSYX 030800 CYQXZZZ xyz';
	b = prxchange('s/(.*\b\d{6}) (\w+)(.*)/$2/',-1,MSG_RAW_TXT);
	put b=;
run;

It might be a better approach to extract the airport to a new variable.You could do that in a data step, and then use that  in your SQL, if for some reason you want the boolean variables.

 

 

mkeintz
PROC Star

You have a sequence of space separated words in a character variable, call it TXT.   You can

  1. Set gander=0.
  2. Find the word position W of "CYQXZZZ" in TXT  (the findw function).
  3. If  W>1 you can extract word number W-1 (the scan function), call it PRIOR_WRD
  4. If
    1. PRIOR_WRD is exactly 6 characters, and
    2. non of the characters is a non-digit
    3. then gander=1
data have;
  input txt $50.;
  put txt=;
datalines;
EIAAYSYX 030800 CYQXZZZ
EIAAYSYX CYQXZZZ
run;

data want (drop=prior_wrd);
  set have;
  w=findw(txt,'CYQXZZZ',' ','E');
  gander=0;
  if w>1 then do;
    prior_wrd=scan(txt,w-1);
    if length(prior_wrd)=6 and notdigit(trim(prior_wrd))=0 then gander=1;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

Try

 

prxmatch("/\b\d{6}\sCYQXZZZ/o", t1.MSG_RAW_TXT) > 0 as GANDER_VOICE

 

Note: the "o" suffix in the pattern tells SAS that the pattern is a constant that should be compiled only once.

PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 413 views
  • 1 like
  • 4 in conversation