Hello Experts,
I would like to import multiples files. My code is :
%macro Mes_fichiers;
data PDF_IN (keep=Mes_fichiers);
length Mes_fichiers $256;
fich=filename('fich',"&A.");
did=dopen('fich');
nb_fich=dnum(did);
do i=1 TO nb_fich;
Mes_fichiers=dread(did,i);
output;
end;
rc=dclose(did);
run;
DATA _null_;
call symputx ('nb', nobs);
SET PDF_IN nobs=nobs;
run;
%do i=1 %to &nb.;
data _NULL_;
set PDF_IN(obs=&i);
CALL SYMPUTX(COMPRESS('Mes_fichiers'),Mes_fichiers);
run;
data table_&i.;
infile "&A.\&Mes_fichiers." dsd dlm="" missover firstobs=2;
informat A $10.;
informat B $28.;
informat C $9.;
informat D $8.;
format A $10.;
format B $28.;
format C $9.;
format D $8.;
input A $1-10 B $12-39 C $40-48 D $49-56;
run;
%end;
%mend;
%Mes_fichiers;
The code is works, but sometimes in my files (are joined) the length of observation is changing.
Do you know, please, how to import the files with variable length of observation ?
Thank you !
Once you have the text into data you can easily explore it.
For example if skip the DATE value when reading the line:
data want;
infile 'myfile.txt' truncover ;
input date :yymmdd. line $300.;
format date yymmdd10.;
if findw(line,'error','i');
run;
Then you could use a short display format with PROC FREQ to get a look at the most common starting values of the strings.
proc freq order=freq data=want;
tables line / list;
format line $30.;
run;
What is it that you are having trouble with exactly?
Is it getting the right data step to consistently read files that look like the two examples you posted?
What values do you want extract from the files? They seem to mainly have three values per line, DATE, BANK_NAME and BANK_ID. The text seems to be constant. Does the text actually vary? Can you highlight some examples of the different texts that can appear?
Let's take a few lines from your first example file and convert it into a temporary text file on our local SAS instance so we can try reading it.
filename example1 temp;
options parmcards=example1;
parmcards4;
2022-03-10 Loaded banks from CSV file 228 Bank from CSV file /srv/thetys-almpower/data/masterDataInterface/work/20220310-235502/Banques_20220310.csv
2022-03-10 No update required for bank LSAF5055 id= 656
2022-03-10 No update required for bank BNCA1045 id= 468
2022-03-10 No update required for bank CMAG1047 id= 577
2022-03-10 No update required for bank MEES594 id= 600
2022-03-10 No update required for bank BCME1146 id= 514
2022-03-10 No update required for bank EURO2994 id= 594
;;;;
So you could read that file pretty easily. Especially if you don't care about the text.
data example1;
length date 8 bank $20 id 8 sourcefile $100 ;
infile example1 truncover ;
input date yymmdd10. @ ;
if _n_=1 then do;
sourcefile = scan(_infile_,-1,' ');
delete;
end;
else input @'bank' bank @'id=' id ;
retain sourcefile;
format date yymmdd10.;
run;
Result
Obs date bank id sourcefile 1 2022-02-25 LSAF5055 656 /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv 2 2022-02-25 BNCA1045 468 /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv 3 2022-02-25 CMAG1047 577 /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv 4 2022-02-25 MEES594 600 /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv 5 2022-02-25 BCME1146 514 /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv 6 2022-02-25 EURO2994 594 /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv 7 2022-02-25 CACF3078 563 /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
Or is it getting the list of files to read?
What is the rule for selecting the file names? Do you just want to read all of the files?
data table_&i.;
infile "&A.\&Mes_fichiers." dsd dlm="" truncover firstobs=2;
length
A $10
B $28
C $9
D $8
;
input A @12 B $28. C D $8.;
run;
Since most of the input is static, you may want to consider not reading four variables, but parsing the date, bank and id from _INFILE_.
I see you have differently structured lines farther down. Read the date (is there in every line) with yymmdd10. and the rest of the line into a long variable (formatted input, don't forget TRUNCOVER), which you then parse dependent on the first word(s).
Reading the lines from the file is simple.
data want;
infile 'myfile.txt' truncover ;
input date :yymmdd. ;
format date yymmdd10.;
length line $300 ;
line=_infile_;
run;
To get help with doing more please clarify what all of the possible lines styles are and what you want to extract out of them.
For example you might want to only keep the lines that have the word ERROR in them.
data want;
infile 'myfile.txt' truncover ;
input date :yymmdd. ;
format date yymmdd10.;
length line $300 ;
line=_infile_;
if findw(line,'error','i') ;
run;
Once you have the text into data you can easily explore it.
For example if skip the DATE value when reading the line:
data want;
infile 'myfile.txt' truncover ;
input date :yymmdd. line $300.;
format date yymmdd10.;
if findw(line,'error','i');
run;
Then you could use a short display format with PROC FREQ to get a look at the most common starting values of the strings.
proc freq order=freq data=want;
tables line / list;
format line $30.;
run;
Read your file like this:
data testfile1;
infile "~/testfile1.txt" truncover;
input date :yymmdd10. line $300.;
format date yymmdd10.;
if index(upcase(line),"ERROR");
run;
Tested on On Demand after uploading your test file.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.