BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MilesWhite
Fluorite | Level 6

Hi everyone.

 

I'm kind of new to SAS programming, I'm using EG 5.1 but have access to 7.1 and I'm having a little problem with a requirement.

 

I'm importing a number of plain text files (txt, csv, etc.) into sas datasets, but I'm trying to validate those files before:

 

- First, I'm checking the existence of those files. DONE

- Second, I'm counting the number of lines in order to assure the observations number is the expected. DONE

- Third, I want to verifiy that the columns the dataset will have makes sense with the predefined layout of each table. In process

 

So far I think the best, and fastest way to do the third point, is importing the first line of each file and count either width (for fixed-width files) or number of delimiters, depending on each file.

 

The thing is I'm not sure on how to do this, or if there's a better way of validating what i want. Think you can help me here?

 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
MilesWhite
Fluorite | Level 6

So far, I've done this:

 

FILENAME _file_ "&input.&file" ;
data _null_;
   length text $32767;
   retain text '';
   infile _file_ dlmstr='//' end=last flowover lrecl=32767; 
   input;
   text=cats(text,_infile_);
   if _n_=1 then call symput('_file_',text);
   if _n_=1 then call symput('length',length(text));
run;

 

With a few more tweaking and adding an excel with the expected layout, I think I can solve my problem.

View solution in original post

3 REPLIES 3
ballardw
Super User

Are you planning on using proc import to bring those files into SAS or a datastep?

 

If you use the layout information provided to write data steps to read the data that data step could have additional items to provide diagnostics.

 

Length of a line may not work for a diagnostic if the file source does not "pad" the end of the data with blanks to the defined length. I have dealt with a number of fixed column files that the last field was not padded.

 

Likewise a simple count of delimiters may not work, especially if the delimiter is a comma, as the delimeter may occur imbedded with values. CSV usually has quotes around text that contains commas such as " This Company Name, Inc.".

A data step with well defined Informat statements will often be very helpful for diagnosing some common data problems such as characters in numeric fields or invalid or poorly formatted dates.

MilesWhite
Fluorite | Level 6

Thanks for the answer.

 

I'd prefer not to use a proc import, since as far as I see, it won't be able to detect changes on the files.

 

Thanks for letting me see the problems of doing the validation counting saparators or width, but as the files I'm supposed to import are created by other area, I don´t have total control over the layout, and I want to prevent an uninformed change on it, that's why my best bet was counting.

 

Also, the idea is avoiding spend time on importing de data if somthing doesn't make sense and reporting the change to the responsible area on order to dertermine if it was a layout change (solve it by adding the new layout to the code) or a transference error (solve it by re-downloading the file).

 

So far my report is something like this:

 

FileLinesLast month's observationsObservations  diff
flie_1100010010.10%
file_21500230034.78%
file_35050

0.00%

 

And I'm trying to get something like this:

 

FileLinesLast month's observationsObservations  diffApparent columnsLast month columns
flie_1100010010.10%1011
file_21500230034.78%55
file_350500.00%2020
MilesWhite
Fluorite | Level 6

So far, I've done this:

 

FILENAME _file_ "&input.&file" ;
data _null_;
   length text $32767;
   retain text '';
   infile _file_ dlmstr='//' end=last flowover lrecl=32767; 
   input;
   text=cats(text,_infile_);
   if _n_=1 then call symput('_file_',text);
   if _n_=1 then call symput('length',length(text));
run;

 

With a few more tweaking and adding an excel with the expected layout, I think I can solve my problem.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 821 views
  • 0 likes
  • 2 in conversation