BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Multipla99
Quartz | Level 8

I would be most grateful if someone could help me to identfify empty strings when the missing function seems not capable of doing this. In my program, a text file is read, the resulting strings are divided into parts, and the missing function is applied to identify non-empty parts. However, empty parts are not always recognized and rows are erroneously kept in the data. Program is run in SAS EG 8.2. 

  

filename in "C:\Temp\Data to read.txt";

data work.data
(keep=Text);
infile in lrecl=200 dsd truncover;
input
@ 1 FirstChar $ 1.
@ 1 Text $ 200.
;
if 'A' <= firstChar <= 'Z' or firstChar in('1', '2');
if firstChar not in (' ', '-', '©');
run;

data work.Data2;
length Text Text2 $ 200 ProductCode $ 6 Comment $ 200 ;
set work.data;
if _N_ = 1
then
do;
retain produktkod_re;
regexp = "/[A-Z][A-Z]\d\d\d\d/";
produktkod_re = prxparse(regexp);
end;
/* Identify product code */
if prxmatch(produktkod_re, Text)
then
do;
which_format = prxparen(produktkod_re);
call prxposn(produktkod_re, which_format, pos, len);
/* Text2 contains everything from Text except what is before product code */
Text2 = substr(Text, pos);

ProductCode = substr(Text2, 1, 6);

/* Comment contains everything efter product code */
Comment = substr(Text2, 7);
end;

/* Exclude all lines where comment is missing */
if not missing(Comment);

run;
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

That's because your variable comment contains tabs as well and though is not missing.

Patrick_0-1619676790373.png

 

Added to the end for your code using the sample data you've posted:

  put comment= $hex10.;

Comment=0909202020

 

You could remove all non-printable characters prior to using the missing() function.

if not missing(compress(Comment,,'w'));

 ....or use prxchange() at the beginning of your code an replace all unprintable characters with a blank.

View solution in original post

6 REPLIES 6
Patrick
Opal | Level 21

That's because your variable comment contains tabs as well and though is not missing.

Patrick_0-1619676790373.png

 

Added to the end for your code using the sample data you've posted:

  put comment= $hex10.;

Comment=0909202020

 

You could remove all non-printable characters prior to using the missing() function.

if not missing(compress(Comment,,'w'));

 ....or use prxchange() at the beginning of your code an replace all unprintable characters with a blank.

Multipla99
Quartz | Level 8

Thanks for all the good information provided!

 

I choose Patrick's answer as solution as it is an exact reply to the question posed. However, Tom's solution is more elegant and it is this proposal that I'm going to implement.  

Patrick
Opal | Level 21

@Multipla99 Given your response and what you've shown us below should work as well for you.

%let regex=\b([A-Z]{2}\d{4})\b;

filename in "C:\Temp\Data to read.txt";
data want(drop=_:);
  infile in lrecl=200 expandtabs truncover;
  input _firstChar $1. @1 str $200.;
  if findc(_firstChar,'12','a')=1;

  length ProductCode $6 cmt $200;
  label cmt='Comment';

  if _n_ = 1 then
    do;
      retain _rxp_id;
      _rxp_id = prxparse("/&regex/");
    end;

  /* Identify product code */
  _pos=prxmatch(_rxp_id, str);
  if _pos>0 then
    do;
      ProductCode=prxposn(_rxp_id,1,str);
      cmt=substrn(str,_pos+lengthn(ProductCode));
      if not missing(cmt) then output;
    end;
run;
Multipla99
Quartz | Level 8

Thank you, Patrick! Very neat and elegante. Hut ab!

ballardw
Super User

Even without @Patrick's code it is possible to tell that you likely have other than space characters at the end of the second line.

 

Opening that text file in Notepad, going to the end of the first line (cursor in first column and press End) and then doing the same to the next line shows a lot more space after the visible text and the cursor. Hitting the left arrow key to "back up one space" jumps a bunch of space for one character which is the typical behavior when TAB characters are present.

Tom
Super User Tom
Super User

Try adding the EXPANDTABS option to your INFILE statement so that SAS will automatically convert the tab characters into the number of spaces needed to move to the next tab stop.  The second IF statement is not needed as non of those characters could make it past the first IF statement.  Also don't add a space into the middle of you informats.  SAS will interpret that as two tokens.  The first is the bare $ of the INPUT statement that tells it to define the variable has character if the type has not already been determined.  The second is the informat.  SAS will silently to add the missing $ to your informat since it now knows you want the variable to be character because of the other part of your input statement.

data work.data;
  infile in lrecl=200 expandtabs truncover;
  input 
    @1 FirstChar $1.
    @1 Text $200.
  ;
  if 'A' <= firstChar <= 'Z' or firstChar in('1', '2');
run;

You can see that SAS is silently change the type of the informat (or format) went the conversion means it is not found.

985   data test;
986     length x $9;
987     input x date9. ;
                ------
                485
NOTE 485-185: Informat $DATE was not found or could not be loaded.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1610 views
  • 12 likes
  • 4 in conversation