BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASdevAnneMarie
Barite | Level 11

Hello Experts,

 

I would like to import multiples files. My code is :

%macro Mes_fichiers;

	data PDF_IN (keep=Mes_fichiers);
		length Mes_fichiers $256;
		fich=filename('fich',"&A.");
		did=dopen('fich');
		nb_fich=dnum(did);

		do i=1 TO nb_fich;
			Mes_fichiers=dread(did,i);
			output;
		end;

		rc=dclose(did);
	run;


	DATA _null_;
		call symputx ('nb', nobs);
		SET PDF_IN nobs=nobs;
	run;

	%do i=1 %to &nb.;

		data _NULL_;
			set PDF_IN(obs=&i);
			CALL SYMPUTX(COMPRESS('Mes_fichiers'),Mes_fichiers);
		run;

		data table_&i.;
			infile "&A.\&Mes_fichiers." dsd dlm="" missover firstobs=2;
			informat A $10.;
			informat B $28.;
			informat C $9.;
			informat D $8.;
			format A $10.;
			format B $28.;
			format C $9.;
			format D $8.;
			input A $1-10 B $12-39 C $40-48 D $49-56;
		run;

	%end;
%mend;

%Mes_fichiers;

The code is works, but sometimes in my files (are joined) the length of observation is changing.

Do you know, please, how to import the files with variable length of observation ?

 

Thank you !

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Once you have the text into data you can easily explore it.

For example if skip the DATE value when reading the line:

data want;
  infile 'myfile.txt' truncover ;
  input date :yymmdd. line $300.;
  format date yymmdd10.;
  if findw(line,'error','i');
run;

Then you could use a short display format with PROC FREQ to get a look at the most common starting values of the strings.

proc freq order=freq data=want;
   tables line / list;
   format line $30.;
run;

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

What is it that you are having trouble with exactly?
Is it getting the right data step to consistently read files that look like the two examples you posted?

What values do you want extract from the files?  They seem to mainly have three values per line, DATE, BANK_NAME and BANK_ID.   The text seems to be constant.  Does the text actually vary?  Can you highlight some examples of the different texts that can appear?

Let's take a few lines from your first example file and convert it into a temporary text file on our local SAS instance so we can try reading it. 

filename example1 temp;
options parmcards=example1;
parmcards4;
2022-03-10 Loaded banks from CSV file 228 Bank from CSV file /srv/thetys-almpower/data/masterDataInterface/work/20220310-235502/Banques_20220310.csv
2022-03-10 No update required for bank LSAF5055 id= 656
2022-03-10 No update required for bank BNCA1045 id= 468
2022-03-10 No update required for bank CMAG1047 id= 577
2022-03-10 No update required for bank MEES594 id= 600
2022-03-10 No update required for bank BCME1146 id= 514
2022-03-10 No update required for bank EURO2994 id= 594
;;;;

So you could read that file pretty easily. Especially if you don't care about the text.

data example1;
  length date 8 bank $20 id 8 sourcefile $100 ;
  infile example1 truncover ;
  input date yymmdd10. @ ;
  if _n_=1 then do;
    sourcefile = scan(_infile_,-1,' ');
    delete;
  end;
  else input @'bank' bank @'id=' id ;
  retain sourcefile;
  format date yymmdd10.;
run;

Result

Obs          date      bank       id                                          sourcefile

 1     2022-02-25    LSAF5055    656    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 2     2022-02-25    BNCA1045    468    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 3     2022-02-25    CMAG1047    577    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 4     2022-02-25    MEES594     600    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 5     2022-02-25    BCME1146    514    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 6     2022-02-25    EURO2994    594    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv
 7     2022-02-25    CACF3078    563    /srv/thetys-almpower/data/masterDataInterface/work/20220225-235501/Banques_20220225.csv

Or is it getting the list of files to read? 

What is the rule for selecting the file names?  Do you just want to read all of the files?

SASdevAnneMarie
Barite | Level 11
Thank you for your answer, Tom.
I would like to extract the text with the word Error. Using my code it’s not correct.
Kurt_Bremser
Super User
  • Do not use tabs in code, replace them with a suitable number of blanks. Both SAS Studio and Enterprise Guide provide settings for this. Tabs are different on different systems and/or for different users, so the visual layout is undetermined when using tabs.
  • Do not use FORMAT or INFORMAT statements to define variables, use a LENGTH statement instead. In most cases, character variables do not need formats/informats.
  • Use a mixture of LIST and FORMATTED INPUT:
data table_&i.;
infile "&A.\&Mes_fichiers." dsd dlm="" truncover firstobs=2;
length
  A $10
  B $28
  C $9
  D $8
;
input A @12 B $28. C D $8.;
run;

Since most of the input is static, you may want to consider not reading four variables, but parsing the date, bank and id from _INFILE_.

Kurt_Bremser
Super User

I see you have differently structured lines farther down. Read the date (is there in every line) with yymmdd10. and the rest of the line into a long variable (formatted input, don't forget TRUNCOVER), which you then parse dependent on the first word(s).

SASdevAnneMarie
Barite | Level 11
Thank you, Kurt!
Do you mean that I must write B $300? I would like to get only the line with the word Error.Where do I need to add truncover? Thank you for the help.
Tom
Super User Tom
Super User

Reading the lines from the file is simple.

data want;
  infile 'myfile.txt' truncover ;
  input date :yymmdd. ;
  format date yymmdd10.;
  length line $300 ;
  line=_infile_;
run;

To get help with doing more please clarify what all of the possible lines styles are and what you want to extract out of them.

For example you might want to only keep the lines that have the word ERROR in them.

data want;
  infile 'myfile.txt' truncover ;
  input date :yymmdd. ;
  format date yymmdd10.;
  length line $300 ;
  line=_infile_;
  if findw(line,'error','i') ;
run;
SASdevAnneMarie
Barite | Level 11
Thank you, Tom.
Unfortunately, I don’t know all possible lines style that I can receive. But I need only the lines with Error.
Tom
Super User Tom
Super User

Once you have the text into data you can easily explore it.

For example if skip the DATE value when reading the line:

data want;
  infile 'myfile.txt' truncover ;
  input date :yymmdd. line $300.;
  format date yymmdd10.;
  if findw(line,'error','i');
run;

Then you could use a short display format with PROC FREQ to get a look at the most common starting values of the strings.

proc freq order=freq data=want;
   tables line / list;
   format line $30.;
run;
SASdevAnneMarie
Barite | Level 11
Thank you, Tom !
Unfortunately the condtion : if findw(line,'error ','i') doesn't work, I have the empty tables
(when I apply the filter of SASEG I have the "error") but I can apply prxmatch function lately on my data.
Kurt_Bremser
Super User

Read your file like this:

data testfile1;
infile "~/testfile1.txt" truncover;
input date :yymmdd10. line $300.;
format date yymmdd10.;
if index(upcase(line),"ERROR");
run;

Tested on On Demand after uploading your test file.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 4092 views
  • 2 likes
  • 3 in conversation