Solved: Re: Import the files with variable length of observations

SASdevAnneMarie · Posted 04-10-2022 03:59 PM

Hello Experts,

I would like to import multiples files. My code is :

%macro Mes_fichiers;

	data PDF_IN (keep=Mes_fichiers);
		length Mes_fichiers $256;
		fich=filename('fich',"&A.");
		did=dopen('fich');
		nb_fich=dnum(did);

		do i=1 TO nb_fich;
			Mes_fichiers=dread(did,i);
			output;
		end;

		rc=dclose(did);
	run;


	DATA _null_;
		call symputx ('nb', nobs);
		SET PDF_IN nobs=nobs;
	run;

	%do i=1 %to &nb.;

		data _NULL_;
			set PDF_IN(obs=&i);
			CALL SYMPUTX(COMPRESS('Mes_fichiers'),Mes_fichiers);
		run;

		data table_&i.;
			infile "&A.\&Mes_fichiers." dsd dlm="" missover firstobs=2;
			informat A $10.;
			informat B $28.;
			informat C $9.;
			informat D $8.;
			format A $10.;
			format B $28.;
			format C $9.;
			format D $8.;
			input A $1-10 B $12-39 C $40-48 D $49-56;
		run;

	%end;
%mend;

%Mes_fichiers;

The code is works, but sometimes in my files (are joined) the length of observation is changing.

Do you know, please, how to import the files with variable length of observation ?

Thank you !

Tom · Posted 04-11-2022 01:53 PM

Once you have the text into data you can easily explore it.

For example if skip the DATE value when reading the line:

data want;
  infile 'myfile.txt' truncover ;
  input date :yymmdd. line $300.;
  format date yymmdd10.;
  if findw(line,'error','i');
run;

Then you could use a short display format with PROC FREQ to get a look at the most common starting values of the strings.

proc freq order=freq data=want;
   tables line / list;
   format line $30.;
run;

View solution in original post

Tom · Posted 04-10-2022 08:46 PM

What is it that you are having trouble with exactly?
Is it getting the right data step to consistently read files that look like the two examples you posted?

What values do you want extract from the files? They seem to mainly have three values per line, DATE, BANK_NAME and BANK_ID. The text seems to be constant. Does the text actually vary? Can you highlight some examples of the different texts that can appear?

Let's take a few lines from your first example file and convert it into a temporary text file on our local SAS instance so we can try reading it.

filename example1 temp;
options parmcards=example1;
parmcards4;
2022-03-10 Loaded banks from CSV file 228 Bank from CSV file /srv/thetys-almpower/data/masterDataInterface/work/20220310-235502/Banques_20220310.csv
2022-03-10 No update required for bank LSAF5055 id= 656
2022-03-10 No update required for bank BNCA1045 id= 468
2022-03-10 No update required for bank CMAG1047 id= 577
2022-03-10 No update required for bank MEES594 id= 600
2022-03-10 No update required for bank BCME1146 id= 514
2022-03-10 No update required for bank EURO2994 id= 594
;;;;

So you could read that file pretty easily. Especially if you don't care about the text.

data example1;
  length date 8 bank $20 id 8 sourcefile $100 ;
  infile example1 truncover ;
  input date yymmdd10. @ ;
  if _n_=1 then do;
    sourcefile = scan(_infile_,-1,' ');
    delete;
  end;
  else input @'bank' bank @'id=' id ;
  retain sourcefile;
  format date yymmdd10.;
run;

Result

Obs          date      bank       id                                          sourcefile

 1     2022-02-25    LSAF5055    656    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 2     2022-02-25    BNCA1045    468    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 3     2022-02-25    CMAG1047    577    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 4     2022-02-25    MEES594     600    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 5     2022-02-25    BCME1146    514    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 6     2022-02-25    EURO2994    594    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 7     2022-02-25    CACF3078    563    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv

Or is it getting the list of files to read?

What is the rule for selecting the file names? Do you just want to read all of the files?

SASdevAnneMarie · Posted 04-11-2022 04:08 AM

Thank you for your answer, Tom.
I would like to extract the text with the word Error. Using my code it’s not correct.

Kurt_Bremser · Posted 04-11-2022 04:26 AM

Do not use tabs in code, replace them with a suitable number of blanks. Both SAS Studio and Enterprise Guide provide settings for this. Tabs are different on different systems and/or for different users, so the visual layout is undetermined when using tabs.
Do not use FORMAT or INFORMAT statements to define variables, use a LENGTH statement instead. In most cases, character variables do not need formats/informats.
Use a mixture of LIST and FORMATTED INPUT:

data table_&i.;
infile "&A.\&Mes_fichiers." dsd dlm="" truncover firstobs=2;
length
  A $10
  B $28
  C $9
  D $8
;
input A @12 B $28. C D $8.;
run;

Since most of the input is static, you may want to consider not reading four variables, but parsing the date, bank and id from _INFILE_.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 04-11-2022 04:32 AM

I see you have differently structured lines farther down. Read the date (is there in every line) with yymmdd10. and the rest of the line into a long variable (formatted input, don't forget TRUNCOVER), which you then parse dependent on the first word(s).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SASdevAnneMarie · Posted 04-11-2022 01:36 PM

Thank you, Kurt!
Do you mean that I must write B $300? I would like to get only the line with the word Error.Where do I need to add truncover? Thank you for the help.

Tom · Posted 04-11-2022 01:44 PM

Reading the lines from the file is simple.

data want;
  infile 'myfile.txt' truncover ;
  input date :yymmdd. ;
  format date yymmdd10.;
  length line $300 ;
  line=_infile_;
run;

To get help with doing more please clarify what all of the possible lines styles are and what you want to extract out of them.

For example you might want to only keep the lines that have the word ERROR in them.

data want;
  infile 'myfile.txt' truncover ;
  input date :yymmdd. ;
  format date yymmdd10.;
  length line $300 ;
  line=_infile_;
  if findw(line,'error','i') ;
run;

SASdevAnneMarie · Posted 04-11-2022 01:49 PM

Thank you, Tom.
Unfortunately, I don’t know all possible lines style that I can receive. But I need only the lines with Error.

Tom · Posted 04-11-2022 01:53 PM

Once you have the text into data you can easily explore it.

For example if skip the DATE value when reading the line:

data want;
  infile 'myfile.txt' truncover ;
  input date :yymmdd. line $300.;
  format date yymmdd10.;
  if findw(line,'error','i');
run;

Then you could use a short display format with PROC FREQ to get a look at the most common starting values of the strings.

proc freq order=freq data=want;
   tables line / list;
   format line $30.;
run;

SASdevAnneMarie · Posted 04-11-2022 04:35 PM

Thank you, Tom !
Unfortunately the condtion : if findw(line,'error ','i') doesn't work, I have the empty tables
(when I apply the filter of SASEG I have the "error") but I can apply prxmatch function lately on my data.

Kurt_Bremser · Posted 04-12-2022 02:58 AM

Read your file like this:

data testfile1;
infile "~/testfile1.txt" truncover;
input date :yymmdd10. line $300.;
format date yymmdd10.;
if index(upcase(line),"ERROR");
run;

Tested on On Demand after uploading your test file.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Registration is open

SAS Training: Just a Click Away