BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
romartins
Fluorite | Level 6

Hello,

I have an FTP where csv files are made available monthly and I need to import these csv files with a ";" as a separator, from FTP by SAS code.

 

The problem is that I need to check if these files were generated in a standardized way. Some necessary standards:
- File encoded as utf-8 so as not to lose the encoding of special characters;
- Have standardized file nomenclature and extension;
- Follow standardization in the fields, that is, order and nomenclature of variables. Always checking a naming pattern without accents or special characters, without spaces and with up to 32 characters. Through a pre-defined order and nomenclature.

 

So I have a few questions:
1 - How do I do these checks and import files via SAS code from FTP?
2 - Is there any way to generate the file in a standard way through Prepare Data in SAS VIYA and upload it to FTP or directly to a SASData folder?

 

Note: I need to generate a table that serves as control. To store the file name, extraction date, if there is any observation (default failure) and a flag informing whether the file has already been read or not. This table will be used to send emails periodically informing if any file has been made available outside of the standard and requires correction.

 

Fictional examples attached.
1 - ARQUIVO_2024_01.csv: File OK;
2 - ARQUIVO_2024_02.csv: File error;
3 - PADRAO_LAYOUT.csv: Columns standard.

 

Tested code to import file:

/*Import file from FTP*/
filename arq_raw FTP "&arquivo." user=&user. pass=&pass. host=&host. cd=&path. debug PASSIVE encoding="utf-8";
	
proc import datafile=arq_raw
	out=WORK.TMP_DADOS_FTP
	dbms=csv replace;
	delimiter=";";
	guessingrows=MAX;
run;

/*Releases connection to FTP*/
filename arq_raw clear;

 

I appreciate the colaboration of everyone.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

For the first problem make sure you are using a SAS session that is running with ENCODING='UTF-8'.  You can use the GETOPTION() function to check the setting of the ENCODING option.  You can add the ENCODING= option to the INFILE statement.

 

For the second problem get the filenames as data.   The FTP filename engine supports the ls command for FTP to get a directory listing. So you can use it to read the filenames into a dataset..  Then it is easy to check and make sure the filenames follow the pattern.

 

The final problem is solved by NOT using PROC IMPORT.  Instead since you seem to know exactly what variables should be in the file just write your own data step to read the file.  You can add a step before that to just read the header line and make sure that the file has the right columns.  It could even be part of the same data step.

For example two of your files seem to have the same columns.  So for those the step to read them might look like this:

data tmp_dados_ftp;
  infile arq_raw dsd dlm=';' truncover ;
  if _n_=1 then do;
     input ;
    if _infile_ ne 'ID;DATA;REGIAO;NOME' then do;
      put 'ERROR: The header line does not match.' / _infile_ ;
      stop;
    end;
  end;
  length ID DATA 8 REGIAO $10 NOME $10;
  format DATA date9.;
  informat DATA mmddyy. ;
  input ID -- NOME ;
run;

These files actually demonstrate one of the issues with using PROC IMPORT to guess how to read the CSV file.  The second column seems to be a date variable.  But the strings that are present could be interpreted as either MDY or DMY order (there are not day values larger than 12).  So PROC IMPORT might not guess to use the right one.

View solution in original post

5 REPLIES 5
Kurt_Bremser
Super User
Never use PROC IMPORT for CSV files. Always write the data step yourself.
This way you have full control over column names and attributes; for a check, you can read the header line in a separate step and check its contents against a reference.
romartins
Fluorite | Level 6

Thanks for the information Kurt_Bremser, I didn't know that the import process wasn't ideal for csv.

I ended up using it because I was trying to verify the file contents. Where situations may occur where the file is destandardized, then the definition of the columns would itself encounter standardization problems.

 

Interesting, how do I do this check when I have a very long column reference, with more than 60 columns?

Tom
Super User Tom
Super User

Just read the header line.  This code will make a dataset with one observation per header value from the first line of the CSV file.

data headers;
  infile 'myfile.csv' dsd obs=1 lrecl=1000000;
  varnum+1;
  input header :$256. @@;
run;

And then compare it to what you expected to find.

 

If you only expect to have actual variable names in the headers then use length of 32 (maximum length of name in SAS) instead of 256 (maximum length of a variable label in SAS).

Tom
Super User Tom
Super User

For the first problem make sure you are using a SAS session that is running with ENCODING='UTF-8'.  You can use the GETOPTION() function to check the setting of the ENCODING option.  You can add the ENCODING= option to the INFILE statement.

 

For the second problem get the filenames as data.   The FTP filename engine supports the ls command for FTP to get a directory listing. So you can use it to read the filenames into a dataset..  Then it is easy to check and make sure the filenames follow the pattern.

 

The final problem is solved by NOT using PROC IMPORT.  Instead since you seem to know exactly what variables should be in the file just write your own data step to read the file.  You can add a step before that to just read the header line and make sure that the file has the right columns.  It could even be part of the same data step.

For example two of your files seem to have the same columns.  So for those the step to read them might look like this:

data tmp_dados_ftp;
  infile arq_raw dsd dlm=';' truncover ;
  if _n_=1 then do;
     input ;
    if _infile_ ne 'ID;DATA;REGIAO;NOME' then do;
      put 'ERROR: The header line does not match.' / _infile_ ;
      stop;
    end;
  end;
  length ID DATA 8 REGIAO $10 NOME $10;
  format DATA date9.;
  informat DATA mmddyy. ;
  input ID -- NOME ;
run;

These files actually demonstrate one of the issues with using PROC IMPORT to guess how to read the CSV file.  The second column seems to be a date variable.  But the strings that are present could be interpreted as either MDY or DMY order (there are not day values larger than 12).  So PROC IMPORT might not guess to use the right one.

romartins
Fluorite | Level 6

Thanks for the information TOM, it was very useful and helped me a lot.

 

Regarding the first issue, I checked the session and it is "UTF-8". Is there any way to check if the FTP file was generated as UTF-8 and display a message otherwise?

Because it happens that you have files that are not UTF-8 in FTP, and when importing them the special characters are "�", this will fail by itself in the first line comparison step. This is to ensure that it only reads UTF-8 files into the stream and avoids the "�" character.


The second problem, thank you for the documentation provided, I believe I'm on the right track, I'm using a data step to list the files on the FTP and perform a check of the file name and extension with the help of regex to check a pre-established pattern.

 

Third point, thanks for the example, I didn't know about this option in the data step to validate the first line of the header.
A question about this point, when I have a very large string, when trying to perform the comparison I have the following return in the log:

NOTE: The currently processed quoted string is now longer than 262 bytes. You may have unbalanced quotes.

How can I avoid this?

 

Thanks in advance!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1247 views
  • 3 likes
  • 3 in conversation