BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASdevAnneMarie
Barite | Level 11
Sorry,
I have another question please 🙂
I don't understand the line: call symputx ('SYSCC', 8, 'G');

Thank you very much!
SASdevAnneMarie
Barite | Level 11

Hello Jim,

I asked the new csv file, they did some correction.

My original code is working, but there is some lag from the line 515.

 

Do you know, please some option, to avoid this lag?

 

Thank you very much!


/*****Importation*****/
data EUROPERF_IMP1;
	dlm=';'||'0D0A'x;
	infile "XXXX\Marie_Eligibles_c.csv_20200910 (2).csv" dlm=dlm  DSD lrecl=32867 /*termstr=CRLF*/
	RECFM=n;

	informat	CodeIsin	$12.;
	informat	Libelle	$50.;
	informat	nature	$12.;
	informat	categAMF	$32.;
	informat	categEPF	$50.;
	informat	zoneGeo	$40.;
	informat	dureeMinReco $8.;
	informat	affectationResultat	$40.;
	informat	fiscaliteText	$50.;
	informat	deviseComptable	$3.;
	informat	dateCloture	$32.;
	informat	dateOuverture	$32.;
	informat	promoteur	$50.;
	informat	adressePromoteur	$80.;
	informat	steGestion	$70.;
	informat	adresseSteGestion	$80.;
	informat	steAdmDelegue	$80.;
	informat	steFinAdmDelegue	$80.;
	informat	depositaire	$80.;
	informat	adresseDepositaire	$80.;
	informat	commissaireCpte	$50.;
	informat	souscription	$100.;
	informat	rachat	$50.;
	informat	fraisGestion	$70.;
	informat	datePerformance	$12.;
	informat	perf12mois	best32.;
	informat	perf36mois	$5.;
	informat	perf60mois	$5.;
	informat	fraisGestionCommentaire	$256.;
	informat	objectifGestion	$256.;
	informat	souscripteurs	$256.;
	informat	orientationPlacement	$256.;
	informat	dateClotureExercice	$256.;
	informat	periodeVL	$256.;
	informat	sitePublication	$256.;
	informat	conditionSouscripRachat	$256.;
	informat	dateValidation		$12.;
	informat	minimumRisqueAction	$5.;
	informat	dateAgrementAMF		$12.;
	informat	Tfe	$20.;
	informat	perfAm1	$8.;
	informat	perfAm2	$8.;
	informat	perfAm3	$8.;
	informat	perfAm4	$8.;
	informat	perfAm5	$8.;
	informat	perfAm6	$8.;
	informat	perfAm7	$8.;
	informat	perfAm8	$8.;
	informat	UnitedureeMinReco	$7.;
	informat	perfYTD	$5.;
	informat	volat_12mois	$5.;
	informat	frequence_text	$30.;
	informat	prospectus_url	$50.;
	informat	jour_valorisation $8.;
	informat	risque	$6.;
	informat	VIDE	$4.;
	format 	CodeIsin	$12.;
	format 	Libelle	$50.;
	format 	nature	$12.;
	format 	categAMF	$32.;
	format 	categEPF	$50.;
	format 	zoneGeo	$40.;
	format 	dureeMinReco $8.;
	format 	affectationResultat	$40.;
	format 	fiscaliteText	$50.;
	format 	deviseComptable	$3.;
	format 	dateCloture	$32.;
	format 	dateOuverture	$32.;
	format 	promoteur	$50.;
	format 	adressePromoteur	$80.;
	format 	steGestion	$70.;
	format 	adresseSteGestion	$80.;
	format 	steAdmDelegue	$80.;
	format 	steFinAdmDelegue	$80.;
	format 	depositaire	$80.;
	format 	adresseDepositaire	$80.;
	format 	commissaireCpte	$50.;
	format 	souscription	$100.;
	format 	rachat	$5.;
	format 	fraisGestion	$70.;
	format 	datePerformance $12.;
	format 	perf12mois	best32.;
	format 	perf36mois	$5.;
	format 	perf60mois	$5.;
	format 	fraisGestionCommentaire	$256.;
	format 	objectifGestion	$256.;
	format 	souscripteurs	$256.;
	format 	orientationPlacement	$256.;
	format 	dateClotureExercice	$256.;
	format 	periodeVL	$256.;
	format 	sitePublication	$256.;
	format  conditionSouscripRachat	$256.;
	format 	dateValidation		$12.;
	format 	minimumRisqueAction	$5.;
	format 	dateAgrementAMF		$12.;
	format 	Tfe	$20.;
	format 	perfAm1	$8.;
	format 	perfAm2	$8.;
	format 	perfAm3	$8.;
	format 	perfAm4	$8.;
	format 	perfAm5	$8.;
	format 	perfAm6	$8.;
	format 	perfAm7	$8.;
	format 	perfAm8	$8.;
	format 	UnitedureeMinReco	$7.;
	format 	perfYTD	$5.;
	format 	volat_12mois	$5.;
	format 	frequence_text	$30.;
	format 	prospectus_url	$50.;
	format 	jour_valorisation	$8.;
	format 	risque	$6.;
	format	VIDE	$4.;
	input			
		CodeIsin	$ 	
		Libelle	$	
		nature	$	
		categAMF	$	
		categEPF	$	
		zoneGeo	$	
		dureeMinReco $ 		
		affectationResultat	$	
		fiscaliteText	$	
		deviseComptable	$	
		dateCloture	$	
		dateOuverture	$	
		promoteur	$	
		adressePromoteur	$	
		steGestion	$	
		adresseSteGestion	$	
		steAdmDelegue	$	
		steFinAdmDelegue	$	
		depositaire	$	
		adresseDepositaire	$	
		commissaireCpte	$	
		souscription	$	
		rachat	$	
		fraisGestion	$	
		datePerformance	$
		perf12mois		
		perf36mois	$	
		perf60mois $		
		fraisGestionCommentaire	$	
		objectifGestion	$	
		souscripteurs	$
		orientationPlacement	$	
		dateClotureExercice	$	
		periodeVL	$	
		sitePublication	$	
		conditionSouscripRachat	$	
		dateValidation	$	
		minimumRisqueAction	$	
		dateAgrementAMF	$	
		Tfe	$	
		perfAm1	$	
		perfAm2	$	
		perfAm3	$	
		perfAm4	$	
		perfAm5	$	
		perfAm6	$	
		perfAm7	$	
		perfAm8	$	
		UnitedureeMinReco $		
		perfYTD	$	
		volat_12mois $		
		frequence_text	$	
		prospectus_url	$	
		jour_valorisation	$	
		risque 	$	
		VIDE $;
run;


 

jimbarbour
Meteorite | Level 14

Oh, dear, Marie.  This data is worse than the last.  There are no CRLF terminators at the end of each record. 

In the first line in the data in the screen print below, there are semi-colons between elements.  These are correct.  But look at the end of the line.  There is only an LF.  This should be a CRLF.  Otherwise, there is no way to tell the difference between a value separator and the end of a record.

Fr_Data_Bad_CRLF_2020-09-11_06-58-07.jpg

This data is not good.  Let me see what I can do.

 

Jim

SASdevAnneMarie
Barite | Level 11
Thank you Jim!
I did'nt understand why with LF "here is no way to tell the difference between a value separator and the end of a record."?



Tom
Super User Tom
Super User

@SASdevAnneMarie 

You need to ask the team that is creating this file to explain to you how they create it.  What is the source? Are they using some software to create the file?  What software?  How do they create the file? Do they run a program?  Is the program always the same or to they have to modify it somehow for each run?  Are they pointing and clicking in some user interface to generate the file?  Is so what do they have in place to insure that these manual steps are done the same each time?

 

If they are using SAS to make the file ask them to show you the code.  Perhaps we can fix their export code to make it so they will give you files that are both consistent in format and compatible with being parsed by a computer program.

Tom
Super User Tom
Super User

You could fix that file by using fact the real lines with with "VIDE".

So with this code:

filename fix temp ;
data _null_;
  infile "&path\&fname" termstr=lf ;
  file fix lrecl=1000000 termstr=crlf;
  input;
  put _infile_ @;
  if scan(_infile_,-1,' ;')='VIDE' then put; 
  else put ';' @;
run;

Again you are getting very close to maximum length lines that _INFILE_ can use.  So you might want to switch to reading the file byte by byte instead.

filename fix2 temp ;
data _null_;
  infile "&path\&fname" recfm=n;
  file fix2 recfm=n;
  input ch $char1. ;
  last4 = cat(lag4(ch),lag3(ch),lag2(ch),lag1(ch));
  if ch='0A'x then do;
    if last4='VIDE' then put '0D'x ;
    else ch=';' ;
  end;
  put ch $char1.;
run;

Now you can read the file.  Note that some of the values are getting close to the maximum length string that SAS can have.  Check out the 36th column.

data check (compress=yes);
  infile fix2 termstr=crlf dsd dlm=';' truncover lrecl=1000000 firstobs=2;
  row+1;
  do col=1 to 56;
    input value :$32767. @ ;
    len=lengthn(value);
    output;
  end;
run;

proc summary data=check ;
  class col ;
  var len;
  output out=summary(drop=_freq_) n=n min=min max=max mean=mean p90=p90 p95=p95 p99=p99;
run;
col      max     p99    p95    p90       mean    min

  .    29161    3613    876     89    161.635      0
  1       12      12     12     12     12.000     12
  2       63      63     60     56     36.885      3
  3        7       5      5      5      4.271      3
  4       30      30     29     23      7.974      0
  5       62      62     50     45     31.540     10
  6       28      16     16     14     10.269      4
  7        3       2      2      2      1.117      0
...
col      max      p99     p95     p90       mean    min

 29     3212     2691    1251     657     272.19      0
 30     8116     6681    5181    4474    2662.99     92
 31     5075     4610    2738    2053    1028.66     17
 32    20116    11423    7934    6194    3037.35      0
 33      357      186     141     102      65.63      0
 34     1642     1318     695     500     221.90      0
 35     1215      764     468     358     153.63      0
 36    29161     4979    2816    2150     991.77      0
 37       10       10      10      10      10.00     10
 38        4        4       4       4       1.50      0
SASdevAnneMarie
Barite | Level 11
Thank you very much Tom,
I’ll do that with actual file.
I’ll make a request for changing the file, now I understand the problem of this file. Thank you for your advices.
Marie
SASdevAnneMarie
Barite | Level 11
I can’t ask to recreate the file:( Could we do this changes automatically please with SAS?
Thank you!
jimbarbour
Meteorite | Level 14

Yes, these changes can be done with SAS.  We would need two data steps.

 

The first Data step:

 

DATA	NULL;
	infile "XXX\Eligibles_c.csv_20200810_Original.csv"
	termstr=CRLF
	lrecl=32767
	RECFM=V
	;

	INPUT;
	_INFILE_	=	TRANWRD(_INFILE_, '0D0A'x, '^^^^');
	_INFILE_	=	TRANWRD(_INFILE_, '0A'x, ';');
	_INFILE_	=	TRANWRD(_INFILE_, '^^^^', '0D0A'x);
	file "XXX\Eligibles_c.csv_20200810_SAS_Edit.csv";
	PUT	_INFILE_;
RUN;

 

The first data step uses the original file, the file with the problems in the data.  The first Data step corrects the problems.

 

The second data step then uses the file created by the first data step, above:

 

data EUROPERF_IMP1_SAS;
	infile "XXX\Eligibles_c.csv_20200810_SAS_Edit.csv"
	termstr=CRLF
	lrecl=32767
	FIRSTOBS=2
	MISSOVER
	RECFM=V
	dlm=';'
	DSD 
	;

	/*	56 Columns	*/
	informat	CodeIsin	$12.;
	informat	Libelle	$50.;
	informat	nature	$12.;
	informat	categAMF	$32.;
	informat	categEPF	$50.;
	informat	zoneGeo	$40.;
	informat	dureeMinReco $8.;
	informat	affectationResultat	$40.;
	informat	fiscaliteText	$50.;
	informat	deviseComptable	$3.;
	informat	dateCloture	$32.;
	informat	dateOuverture	$32.;
	informat	promoteur	$50.;
	informat	adressePromoteur	$80.;
	informat	steGestion	$70.;
	informat	adresseSteGestion	$80.;
	informat	steAdmDelegue	$80.;
	informat	steFinAdmDelegue	$80.;
	informat	depositaire	$80.;
	informat	adresseDepositaire	$80.;
	informat	commissaireCpte	$50.;
	informat	souscription	$100.;
	informat	rachat	$50.;
	informat	fraisGestion	$70.;
	informat	datePerformance	$12.;
	informat	perf12mois	best32.;
	informat	perf36mois	$5.;
	informat	perf60mois	$5.;
	informat	fraisGestionCommentaire	$256.;
	informat	objectifGestion	$256.;
	informat	souscripteurs	$256.;
	informat	orientationPlacement	$256.;
	informat	dateClotureExercice	$256.;
	informat	periodeVL	$256.;
	informat	sitePublication	$256.;
	informat	conditionSouscripRachat	$256.;
	informat	dateValidation		$12.;
	informat	minimumRisqueAction	$5.;
	informat	dateAgrementAMF		$12.;
	informat	Tfe	$20.;
	informat	perfAm1	$8.;
	informat	perfAm2	$8.;
	informat	perfAm3	$8.;
	informat	perfAm4	$8.;
	informat	perfAm5	$8.;
	informat	perfAm6	$8.;
	informat	perfAm7	$8.;
	informat	perfAm8	$8.;
	informat	UnitedureeMinReco	$7.;
	informat	perfYTD	$5.;
	informat	volat_12mois	$5.;
	informat	frequence_text	$30.;
	informat	prospectus_url	$50.;
	informat	jour_valorisation $8.;
	informat	risque	$6.;
	informat	VIDE	$4.;

	/*	56 Columns	*/
	format 	CodeIsin	$12.;
	format 	Libelle	$50.;
	format 	nature	$12.;
	format 	categAMF	$32.;
	format 	categEPF	$50.;
	format 	zoneGeo	$40.;
	format 	dureeMinReco $8.;
	format 	affectationResultat	$40.;
	format 	fiscaliteText	$50.;
	format 	deviseComptable	$3.;
	format 	dateCloture	$32.;
	format 	dateOuverture	$32.;
	format 	promoteur	$50.;
	format 	adressePromoteur	$80.;
	format 	steGestion	$70.;
	format 	adresseSteGestion	$80.;
	format 	steAdmDelegue	$80.;
	format 	steFinAdmDelegue	$80.;
	format 	depositaire	$80.;
	format 	adresseDepositaire	$80.;
	format 	commissaireCpte	$50.;
	format 	souscription	$100.;
	format 	rachat	$5.;
	format 	fraisGestion	$70.;
	format 	datePerformance $12.;
	format 	perf12mois	best32.;
	format 	perf36mois	$5.;
	format 	perf60mois	$5.;
	format 	fraisGestionCommentaire	$256.;
	format 	objectifGestion	$256.;
	format 	souscripteurs	$256.;
	format 	orientationPlacement	$256.;
	format 	dateClotureExercice	$256.;
	format 	periodeVL	$256.;
	format 	sitePublication	$256.;
	format  conditionSouscripRachat	$256.;
	format 	dateValidation		$12.;
	format 	minimumRisqueAction	$5.;
	format 	dateAgrementAMF		$12.;
	format 	Tfe	$20.;
	format 	perfAm1	$8.;
	format 	perfAm2	$8.;
	format 	perfAm3	$8.;
	format 	perfAm4	$8.;
	format 	perfAm5	$8.;
	format 	perfAm6	$8.;
	format 	perfAm7	$8.;
	format 	perfAm8	$8.;
	format 	UnitedureeMinReco	$7.;
	format 	perfYTD	$5.;
	format 	volat_12mois	$5.;
	format 	frequence_text	$30.;
	format 	prospectus_url	$50.;
	format 	jour_valorisation	$8.;
	format 	risque	$6.;
	format	VIDE	$4.;

	/*	56 Columns	*/
	input			
		CodeIsin	$ 	
		Libelle	$	
		nature	$	
		categAMF	$	
		categEPF	$	
		zoneGeo	$	
		dureeMinReco $ 		
		affectationResultat	$	
		fiscaliteText	$	
		deviseComptable	$	
		dateCloture	$	
		dateOuverture	$	
		promoteur	$	
		adressePromoteur	$	
		steGestion	$	
		adresseSteGestion	$	
		steAdmDelegue	$	
		steFinAdmDelegue	$	
		depositaire	$	
		adresseDepositaire	$	
		commissaireCpte	$	
		souscription	$	
		rachat	$	
		fraisGestion	$	
		datePerformance	$
		perf12mois		
		perf36mois	$	
		perf60mois $		
		fraisGestionCommentaire	$	
		objectifGestion	$	
		souscripteurs	$
		orientationPlacement	$	
		dateClotureExercice	$	
		periodeVL	$	
		sitePublication	$	
		conditionSouscripRachat	$	
		dateValidation	$	
		minimumRisqueAction	$	
		dateAgrementAMF	$	
		Tfe	$	
		perfAm1	$	
		perfAm2	$	
		perfAm3	$	
		perfAm4	$	
		perfAm5	$	
		perfAm6	$	
		perfAm7	$	
		perfAm8	$	
		UnitedureeMinReco $		
		perfYTD	$	
		volat_12mois $		
		frequence_text	$	
		prospectus_url	$	
		jour_valorisation	$	
		risque 	$	
		VIDE $;
run;

 

Why don't you try this code and let me know if it works?

 

Jim

 

SASdevAnneMarie
Barite | Level 11
Yes, I’ll try it! Would be great if it works because I can’t change the structure of file 😞
Thank you very much!
jimbarbour
Meteorite | Level 14

@SASdevAnneMarie,

 

This is much more difficult.  At the end of each record, there should be a termination string (termstr) of '0D0A'x.  The TermStr tells SAS where to stop reading each record and to begin a new record.  When we code 

 

TERMSTR=CRLF

we are telling SAS to look for '0D0A'x at the end of each record.  CR stands for Carriage Return ('0D'x) and LF stands for Line Feed ('0A'x).  

 

Unfortunately in this latest data, there is not a CRLF at the end of each record.  Instead there is only an LF.

 

Now, here is where the problem is:  This is delimited data.  Between each data element (variable), there is a separator.  It should be a semi-colon, but in this data, frequently it is a LF.  Do you see the problem?  How does SAS tell which LF's are the end of a record and which LF's are merely separators?  There is no easy way to tell the difference.

 

However, we are a little bit lucky here.  In your data, there is the text "VIDE" at the end of each record.  We can change each VIDELF

combination to VIDECRLF.  I did this by hand using Notepad++, and I got the following results:

Fr_Data_Bad_CRLF_Partial_Manual_Fix_A_2020-09-11_06-58-07.jpg

 

These results are mostly good, but if we look more closely at the 9th record, toward the end of the record we see:

Fr_Data_Bad_CRLF_Partial_Manual_Fix_B_2020-09-11_06-58-07.jpg

 

What is the problem here?  Again, it is the LF.  There are many LF's inside the column.  In other words, there are "extra" LF's within each column. SAS has no way to decide which LF's are separators and which LF's are merely data errors.  I cannot think of a good way to fix the problems in the 9th record.

 

I might be able to figure out how to programatically correct the data problem (the missing CRLF) at the end of each record, but if the text "VIDE" is missing, then whatever program I write will not work correctly.  In addition, I cannot easily address the issues in the 9th record.

 

I think you really must go back to whomsoever created the data and tell them about these problems.  This is bad data.  They cannot expect you to constantly have to write SAS code to try to compensate for all this bad data.  They should fix the data.

 

Jim

 

SASdevAnneMarie
Barite | Level 11
Thank you Jim,
I see!
I thought that there is some solution because my code works for line with “vide” at the end. Without “vide” I had a gap.
Unfortunately I need to use this file this month:(
jimbarbour
Meteorite | Level 14

@SASdevAnneMarie,

 

I have written a new SAS program.  This new SAS program must be run before the main import program.  This new SAS program requires that each record end with "VIDELF" or the program will not run correctly.

 

The new SAS program does several things:

  1. It replaces VIDELF (incorrect) with VIDECRLF (correct).
  2. It removes LF's that are in error and replaces them with a semi colon.
  3. It removes CR's that are in error and replaces them with a semi colon.

Here is the code:

 

%LET	Cmnt					=	*;
%LET	LRECL					=	32767;
%LET	Limit					=	50000;

FILENAME	Elig_In		'XXX\Marie_Eligibles_c.csv_20200910 (2).csv';

FILENAME	Elig_Out	'XXX\Marie_Eligibles_c.csv_20200910 (2)_SAS_Pre_Edit.csv';

DATA	_NULL_;
	RETAIN	Carriage_Return		'0D'x;
	RETAIN	Line_Feed			'0A'x;
	RETAIN	Semi_Colon			';';

	%IF	%BQUOTE(&Cmnt)			=	%BQUOTE()			%THEN
		%DO;
			IF	_N_				>	&Limit				THEN
				DO;
					PUTLOG	"NOTE-  ";
					PUTLOG	"NOTE:  Stopping "	_N_=;
					STOP;
				END;
		%END;

	INFILE	Elig_In
		RECFM					=	N
		;

	FILE	Elig_Out
		LRECL					=	&LRECL
		RECFM					=	N
		;

	ARRAY	Chars	[5]	$1	_TEMPORARY_;

	INPUT	Char	$1.;

	LINK	Load_Array;
	LINK	Check_Chars;

	******;
	RETURN;
	******;

	**********;
	Load_Array:
	**********;
		Chars[5]				=	Chars[4];
		Chars[4]				=	Chars[3];
		Chars[3]				=	Chars[2];
		Chars[2]				=	Chars[1];
		Chars[1]				=	UPCASE(Char);
	******;
	RETURN;
	******;

	***********;
	Check_Chars:
	***********;
		IF	Chars[5]			=	'V'					AND
			Chars[4]			=	'I'					AND
			Chars[3]			=	'D'					AND
			Chars[2]			=	'E'					AND
			Chars[1]			=	Line_Feed			THEN
			DO;
				&Cmnt	PUTLOG	"NOTE-  *** VIDE and Line feed detected *** "					_N_= Chars[5]= Chars[4]= Chars[3]= Chars[2]= Chars[1]= Char=;
				PUT	Carriage_Return		$1.;
				PUT	Line_Feed			$1.;
			END;
		ELSE
			DO;
				IF	Char		=	Line_Feed			THEN
					DO;
						&Cmnt	PUTLOG	"NOTE-  *** Stand alone line feed detected *** "		_N_= Char=;
						PUT	Semi_Colon	$1.;
					END;
				ELSE
				IF	Char		=	Carriage_Return		THEN
					DO;
						&Cmnt	PUTLOG	"NOTE-  *** Stand alone carriage return detected *** "	_N_= Char=;
						PUT	Semi_Colon	$1.;
					END;
				ELSE
					DO;
						&Cmnt	PUTLOG	"NOTE-  Other character detected "						_N_= Char=;
						PUT	Char		$1.;
					END;
			END;
	******;
	RETURN;
	******;
RUN;

After you run the above new SAS program, you can run the import program with no other steps in between.

 

This new SAS program will not take care of record 9 which has LF's that are not separators and are not terminators.  It may be possible to remove these, but that will take extra programming, and I need to get back to my regular job.  Really, if the data is this bad, the person who creates the data should fix the data.  If the data cannot be fixed, then the company in charge or reading the data needs to hire a consultant or contractor who has the specialized programming skill needed to work with this kind of data.  I would be happy to do it, but I cannot work for free always.  Usually, I try to give advice or tips so that the person posting to the SAS Community can then continue on their own, but here I am 100% writing the entire program which is not exactly the spirit of the SAS Community as I understand it.  The SAS Community is here to assist but not to be free labor.  I hope you understand.

 

If you have any problems with the program, please let me know, and I will he happy to assist.  If the programs serve you, please mark one of my posts as the solution so that I may at least have a little recognition for my work.  I would appreciate that.

 

Jim

 

SASdevAnneMarie
Barite | Level 11
Thank you very much Jim,
I’m totally agree with you!
I must fixe the format of file. I had also the confirmation of @Tom that it’s the bad data, so I can argument my request of changing the file.
Marie

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 52 replies
  • 3577 views
  • 19 likes
  • 5 in conversation