DATA Step, Macro, Functions and more

How do I verify plain text width or delimiter quantity?

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

How do I verify plain text width or delimiter quantity?

Hi everyone.

 

I'm kind of new to SAS programming, I'm using EG 5.1 but have access to 7.1 and I'm having a little problem with a requirement.

 

I'm importing a number of plain text files (txt, csv, etc.) into sas datasets, but I'm trying to validate those files before:

 

- First, I'm checking the existence of those files. DONE

- Second, I'm counting the number of lines in order to assure the observations number is the expected. DONE

- Third, I want to verifiy that the columns the dataset will have makes sense with the predefined layout of each table. In process

 

So far I think the best, and fastest way to do the third point, is importing the first line of each file and count either width (for fixed-width files) or number of delimiters, depending on each file.

 

The thing is I'm not sure on how to do this, or if there's a better way of validating what i want. Think you can help me here?

 

Thanks in advance!


Accepted Solutions
Solution
‎11-10-2016 06:47 PM
New Contributor
Posts: 3

Re: How do I verify plain text width or delimiter quantity?

Posted in reply to MilesWhite

So far, I've done this:

 

FILENAME _file_ "&input.&file" ;
data _null_;
   length text $32767;
   retain text '';
   infile _file_ dlmstr='//' end=last flowover lrecl=32767; 
   input;
   text=cats(text,_infile_);
   if _n_=1 then call symput('_file_',text);
   if _n_=1 then call symput('length',length(text));
run;

 

With a few more tweaking and adding an excel with the expected layout, I think I can solve my problem.

View solution in original post


All Replies
Super User
Posts: 11,343

Re: How do I verify plain text width or delimiter quantity?

Posted in reply to MilesWhite

Are you planning on using proc import to bring those files into SAS or a datastep?

 

If you use the layout information provided to write data steps to read the data that data step could have additional items to provide diagnostics.

 

Length of a line may not work for a diagnostic if the file source does not "pad" the end of the data with blanks to the defined length. I have dealt with a number of fixed column files that the last field was not padded.

 

Likewise a simple count of delimiters may not work, especially if the delimiter is a comma, as the delimeter may occur imbedded with values. CSV usually has quotes around text that contains commas such as " This Company Name, Inc.".

A data step with well defined Informat statements will often be very helpful for diagnosing some common data problems such as characters in numeric fields or invalid or poorly formatted dates.

New Contributor
Posts: 3

Re: How do I verify plain text width or delimiter quantity?

[ Edited ]

Thanks for the answer.

 

I'd prefer not to use a proc import, since as far as I see, it won't be able to detect changes on the files.

 

Thanks for letting me see the problems of doing the validation counting saparators or width, but as the files I'm supposed to import are created by other area, I don´t have total control over the layout, and I want to prevent an uninformed change on it, that's why my best bet was counting.

 

Also, the idea is avoiding spend time on importing de data if somthing doesn't make sense and reporting the change to the responsible area on order to dertermine if it was a layout change (solve it by adding the new layout to the code) or a transference error (solve it by re-downloading the file).

 

So far my report is something like this:

 

FileLinesLast month's observationsObservations  diff
flie_1100010010.10%
file_21500230034.78%
file_35050

0.00%

 

And I'm trying to get something like this:

 

FileLinesLast month's observationsObservations  diffApparent columnsLast month columns
flie_1100010010.10%1011
file_21500230034.78%55
file_350500.00%2020
Solution
‎11-10-2016 06:47 PM
New Contributor
Posts: 3

Re: How do I verify plain text width or delimiter quantity?

Posted in reply to MilesWhite

So far, I've done this:

 

FILENAME _file_ "&input.&file" ;
data _null_;
   length text $32767;
   retain text '';
   infile _file_ dlmstr='//' end=last flowover lrecl=32767; 
   input;
   text=cats(text,_infile_);
   if _n_=1 then call symput('_file_',text);
   if _n_=1 then call symput('length',length(text));
run;

 

With a few more tweaking and adding an excel with the expected layout, I think I can solve my problem.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 205 views
  • 0 likes
  • 2 in conversation