@SASdevAnneMarie wrote:
Sorry,
I have another question please 🙂
I don't understand the line: call symputx ('SYSCC', 8, 'G');
Thank you very much!
Hello Jim,
I asked the new csv file, they did some correction.
My original code is working, but there is some lag from the line 515.
Do you know, please some option, to avoid this lag?
Thank you very much!
/*****Importation*****/
data EUROPERF_IMP1;
dlm=';'||'0D0A'x;
infile "XXXX\Marie_Eligibles_c.csv_20200910 (2).csv" dlm=dlm DSD lrecl=32867 /*termstr=CRLF*/
RECFM=n;
informat CodeIsin $12.;
informat Libelle $50.;
informat nature $12.;
informat categAMF $32.;
informat categEPF $50.;
informat zoneGeo $40.;
informat dureeMinReco $8.;
informat affectationResultat $40.;
informat fiscaliteText $50.;
informat deviseComptable $3.;
informat dateCloture $32.;
informat dateOuverture $32.;
informat promoteur $50.;
informat adressePromoteur $80.;
informat steGestion $70.;
informat adresseSteGestion $80.;
informat steAdmDelegue $80.;
informat steFinAdmDelegue $80.;
informat depositaire $80.;
informat adresseDepositaire $80.;
informat commissaireCpte $50.;
informat souscription $100.;
informat rachat $50.;
informat fraisGestion $70.;
informat datePerformance $12.;
informat perf12mois best32.;
informat perf36mois $5.;
informat perf60mois $5.;
informat fraisGestionCommentaire $256.;
informat objectifGestion $256.;
informat souscripteurs $256.;
informat orientationPlacement $256.;
informat dateClotureExercice $256.;
informat periodeVL $256.;
informat sitePublication $256.;
informat conditionSouscripRachat $256.;
informat dateValidation $12.;
informat minimumRisqueAction $5.;
informat dateAgrementAMF $12.;
informat Tfe $20.;
informat perfAm1 $8.;
informat perfAm2 $8.;
informat perfAm3 $8.;
informat perfAm4 $8.;
informat perfAm5 $8.;
informat perfAm6 $8.;
informat perfAm7 $8.;
informat perfAm8 $8.;
informat UnitedureeMinReco $7.;
informat perfYTD $5.;
informat volat_12mois $5.;
informat frequence_text $30.;
informat prospectus_url $50.;
informat jour_valorisation $8.;
informat risque $6.;
informat VIDE $4.;
format CodeIsin $12.;
format Libelle $50.;
format nature $12.;
format categAMF $32.;
format categEPF $50.;
format zoneGeo $40.;
format dureeMinReco $8.;
format affectationResultat $40.;
format fiscaliteText $50.;
format deviseComptable $3.;
format dateCloture $32.;
format dateOuverture $32.;
format promoteur $50.;
format adressePromoteur $80.;
format steGestion $70.;
format adresseSteGestion $80.;
format steAdmDelegue $80.;
format steFinAdmDelegue $80.;
format depositaire $80.;
format adresseDepositaire $80.;
format commissaireCpte $50.;
format souscription $100.;
format rachat $5.;
format fraisGestion $70.;
format datePerformance $12.;
format perf12mois best32.;
format perf36mois $5.;
format perf60mois $5.;
format fraisGestionCommentaire $256.;
format objectifGestion $256.;
format souscripteurs $256.;
format orientationPlacement $256.;
format dateClotureExercice $256.;
format periodeVL $256.;
format sitePublication $256.;
format conditionSouscripRachat $256.;
format dateValidation $12.;
format minimumRisqueAction $5.;
format dateAgrementAMF $12.;
format Tfe $20.;
format perfAm1 $8.;
format perfAm2 $8.;
format perfAm3 $8.;
format perfAm4 $8.;
format perfAm5 $8.;
format perfAm6 $8.;
format perfAm7 $8.;
format perfAm8 $8.;
format UnitedureeMinReco $7.;
format perfYTD $5.;
format volat_12mois $5.;
format frequence_text $30.;
format prospectus_url $50.;
format jour_valorisation $8.;
format risque $6.;
format VIDE $4.;
input
CodeIsin $
Libelle $
nature $
categAMF $
categEPF $
zoneGeo $
dureeMinReco $
affectationResultat $
fiscaliteText $
deviseComptable $
dateCloture $
dateOuverture $
promoteur $
adressePromoteur $
steGestion $
adresseSteGestion $
steAdmDelegue $
steFinAdmDelegue $
depositaire $
adresseDepositaire $
commissaireCpte $
souscription $
rachat $
fraisGestion $
datePerformance $
perf12mois
perf36mois $
perf60mois $
fraisGestionCommentaire $
objectifGestion $
souscripteurs $
orientationPlacement $
dateClotureExercice $
periodeVL $
sitePublication $
conditionSouscripRachat $
dateValidation $
minimumRisqueAction $
dateAgrementAMF $
Tfe $
perfAm1 $
perfAm2 $
perfAm3 $
perfAm4 $
perfAm5 $
perfAm6 $
perfAm7 $
perfAm8 $
UnitedureeMinReco $
perfYTD $
volat_12mois $
frequence_text $
prospectus_url $
jour_valorisation $
risque $
VIDE $;
run;
Oh, dear, Marie. This data is worse than the last. There are no CRLF terminators at the end of each record.
In the first line in the data in the screen print below, there are semi-colons between elements. These are correct. But look at the end of the line. There is only an LF. This should be a CRLF. Otherwise, there is no way to tell the difference between a value separator and the end of a record.
This data is not good. Let me see what I can do.
Jim
You need to ask the team that is creating this file to explain to you how they create it. What is the source? Are they using some software to create the file? What software? How do they create the file? Do they run a program? Is the program always the same or to they have to modify it somehow for each run? Are they pointing and clicking in some user interface to generate the file? Is so what do they have in place to insure that these manual steps are done the same each time?
If they are using SAS to make the file ask them to show you the code. Perhaps we can fix their export code to make it so they will give you files that are both consistent in format and compatible with being parsed by a computer program.
You could fix that file by using fact the real lines with with "VIDE".
So with this code:
filename fix temp ;
data _null_;
infile "&path\&fname" termstr=lf ;
file fix lrecl=1000000 termstr=crlf;
input;
put _infile_ @;
if scan(_infile_,-1,' ;')='VIDE' then put;
else put ';' @;
run;
Again you are getting very close to maximum length lines that _INFILE_ can use. So you might want to switch to reading the file byte by byte instead.
filename fix2 temp ;
data _null_;
infile "&path\&fname" recfm=n;
file fix2 recfm=n;
input ch $char1. ;
last4 = cat(lag4(ch),lag3(ch),lag2(ch),lag1(ch));
if ch='0A'x then do;
if last4='VIDE' then put '0D'x ;
else ch=';' ;
end;
put ch $char1.;
run;
Now you can read the file. Note that some of the values are getting close to the maximum length string that SAS can have. Check out the 36th column.
data check (compress=yes);
infile fix2 termstr=crlf dsd dlm=';' truncover lrecl=1000000 firstobs=2;
row+1;
do col=1 to 56;
input value :$32767. @ ;
len=lengthn(value);
output;
end;
run;
proc summary data=check ;
class col ;
var len;
output out=summary(drop=_freq_) n=n min=min max=max mean=mean p90=p90 p95=p95 p99=p99;
run;
col max p99 p95 p90 mean min . 29161 3613 876 89 161.635 0 1 12 12 12 12 12.000 12 2 63 63 60 56 36.885 3 3 7 5 5 5 4.271 3 4 30 30 29 23 7.974 0 5 62 62 50 45 31.540 10 6 28 16 16 14 10.269 4 7 3 2 2 2 1.117 0 ... col max p99 p95 p90 mean min 29 3212 2691 1251 657 272.19 0 30 8116 6681 5181 4474 2662.99 92 31 5075 4610 2738 2053 1028.66 17 32 20116 11423 7934 6194 3037.35 0 33 357 186 141 102 65.63 0 34 1642 1318 695 500 221.90 0 35 1215 764 468 358 153.63 0 36 29161 4979 2816 2150 991.77 0 37 10 10 10 10 10.00 10 38 4 4 4 4 1.50 0
Yes, these changes can be done with SAS. We would need two data steps.
The first Data step:
DATA NULL;
infile "XXX\Eligibles_c.csv_20200810_Original.csv"
termstr=CRLF
lrecl=32767
RECFM=V
;
INPUT;
_INFILE_ = TRANWRD(_INFILE_, '0D0A'x, '^^^^');
_INFILE_ = TRANWRD(_INFILE_, '0A'x, ';');
_INFILE_ = TRANWRD(_INFILE_, '^^^^', '0D0A'x);
file "XXX\Eligibles_c.csv_20200810_SAS_Edit.csv";
PUT _INFILE_;
RUN;
The first data step uses the original file, the file with the problems in the data. The first Data step corrects the problems.
The second data step then uses the file created by the first data step, above:
data EUROPERF_IMP1_SAS;
infile "XXX\Eligibles_c.csv_20200810_SAS_Edit.csv"
termstr=CRLF
lrecl=32767
FIRSTOBS=2
MISSOVER
RECFM=V
dlm=';'
DSD
;
/* 56 Columns */
informat CodeIsin $12.;
informat Libelle $50.;
informat nature $12.;
informat categAMF $32.;
informat categEPF $50.;
informat zoneGeo $40.;
informat dureeMinReco $8.;
informat affectationResultat $40.;
informat fiscaliteText $50.;
informat deviseComptable $3.;
informat dateCloture $32.;
informat dateOuverture $32.;
informat promoteur $50.;
informat adressePromoteur $80.;
informat steGestion $70.;
informat adresseSteGestion $80.;
informat steAdmDelegue $80.;
informat steFinAdmDelegue $80.;
informat depositaire $80.;
informat adresseDepositaire $80.;
informat commissaireCpte $50.;
informat souscription $100.;
informat rachat $50.;
informat fraisGestion $70.;
informat datePerformance $12.;
informat perf12mois best32.;
informat perf36mois $5.;
informat perf60mois $5.;
informat fraisGestionCommentaire $256.;
informat objectifGestion $256.;
informat souscripteurs $256.;
informat orientationPlacement $256.;
informat dateClotureExercice $256.;
informat periodeVL $256.;
informat sitePublication $256.;
informat conditionSouscripRachat $256.;
informat dateValidation $12.;
informat minimumRisqueAction $5.;
informat dateAgrementAMF $12.;
informat Tfe $20.;
informat perfAm1 $8.;
informat perfAm2 $8.;
informat perfAm3 $8.;
informat perfAm4 $8.;
informat perfAm5 $8.;
informat perfAm6 $8.;
informat perfAm7 $8.;
informat perfAm8 $8.;
informat UnitedureeMinReco $7.;
informat perfYTD $5.;
informat volat_12mois $5.;
informat frequence_text $30.;
informat prospectus_url $50.;
informat jour_valorisation $8.;
informat risque $6.;
informat VIDE $4.;
/* 56 Columns */
format CodeIsin $12.;
format Libelle $50.;
format nature $12.;
format categAMF $32.;
format categEPF $50.;
format zoneGeo $40.;
format dureeMinReco $8.;
format affectationResultat $40.;
format fiscaliteText $50.;
format deviseComptable $3.;
format dateCloture $32.;
format dateOuverture $32.;
format promoteur $50.;
format adressePromoteur $80.;
format steGestion $70.;
format adresseSteGestion $80.;
format steAdmDelegue $80.;
format steFinAdmDelegue $80.;
format depositaire $80.;
format adresseDepositaire $80.;
format commissaireCpte $50.;
format souscription $100.;
format rachat $5.;
format fraisGestion $70.;
format datePerformance $12.;
format perf12mois best32.;
format perf36mois $5.;
format perf60mois $5.;
format fraisGestionCommentaire $256.;
format objectifGestion $256.;
format souscripteurs $256.;
format orientationPlacement $256.;
format dateClotureExercice $256.;
format periodeVL $256.;
format sitePublication $256.;
format conditionSouscripRachat $256.;
format dateValidation $12.;
format minimumRisqueAction $5.;
format dateAgrementAMF $12.;
format Tfe $20.;
format perfAm1 $8.;
format perfAm2 $8.;
format perfAm3 $8.;
format perfAm4 $8.;
format perfAm5 $8.;
format perfAm6 $8.;
format perfAm7 $8.;
format perfAm8 $8.;
format UnitedureeMinReco $7.;
format perfYTD $5.;
format volat_12mois $5.;
format frequence_text $30.;
format prospectus_url $50.;
format jour_valorisation $8.;
format risque $6.;
format VIDE $4.;
/* 56 Columns */
input
CodeIsin $
Libelle $
nature $
categAMF $
categEPF $
zoneGeo $
dureeMinReco $
affectationResultat $
fiscaliteText $
deviseComptable $
dateCloture $
dateOuverture $
promoteur $
adressePromoteur $
steGestion $
adresseSteGestion $
steAdmDelegue $
steFinAdmDelegue $
depositaire $
adresseDepositaire $
commissaireCpte $
souscription $
rachat $
fraisGestion $
datePerformance $
perf12mois
perf36mois $
perf60mois $
fraisGestionCommentaire $
objectifGestion $
souscripteurs $
orientationPlacement $
dateClotureExercice $
periodeVL $
sitePublication $
conditionSouscripRachat $
dateValidation $
minimumRisqueAction $
dateAgrementAMF $
Tfe $
perfAm1 $
perfAm2 $
perfAm3 $
perfAm4 $
perfAm5 $
perfAm6 $
perfAm7 $
perfAm8 $
UnitedureeMinReco $
perfYTD $
volat_12mois $
frequence_text $
prospectus_url $
jour_valorisation $
risque $
VIDE $;
run;
Why don't you try this code and let me know if it works?
Jim
This is much more difficult. At the end of each record, there should be a termination string (termstr) of '0D0A'x. The TermStr tells SAS where to stop reading each record and to begin a new record. When we code
TERMSTR=CRLF
we are telling SAS to look for '0D0A'x at the end of each record. CR stands for Carriage Return ('0D'x) and LF stands for Line Feed ('0A'x).
Unfortunately in this latest data, there is not a CRLF at the end of each record. Instead there is only an LF.
Now, here is where the problem is: This is delimited data. Between each data element (variable), there is a separator. It should be a semi-colon, but in this data, frequently it is a LF. Do you see the problem? How does SAS tell which LF's are the end of a record and which LF's are merely separators? There is no easy way to tell the difference.
However, we are a little bit lucky here. In your data, there is the text "VIDE" at the end of each record. We can change each VIDELF
combination to VIDECRLF. I did this by hand using Notepad++, and I got the following results:
These results are mostly good, but if we look more closely at the 9th record, toward the end of the record we see:
What is the problem here? Again, it is the LF. There are many LF's inside the column. In other words, there are "extra" LF's within each column. SAS has no way to decide which LF's are separators and which LF's are merely data errors. I cannot think of a good way to fix the problems in the 9th record.
I might be able to figure out how to programatically correct the data problem (the missing CRLF) at the end of each record, but if the text "VIDE" is missing, then whatever program I write will not work correctly. In addition, I cannot easily address the issues in the 9th record.
I think you really must go back to whomsoever created the data and tell them about these problems. This is bad data. They cannot expect you to constantly have to write SAS code to try to compensate for all this bad data. They should fix the data.
Jim
I have written a new SAS program. This new SAS program must be run before the main import program. This new SAS program requires that each record end with "VIDELF" or the program will not run correctly.
The new SAS program does several things:
Here is the code:
%LET Cmnt = *;
%LET LRECL = 32767;
%LET Limit = 50000;
FILENAME Elig_In 'XXX\Marie_Eligibles_c.csv_20200910 (2).csv';
FILENAME Elig_Out 'XXX\Marie_Eligibles_c.csv_20200910 (2)_SAS_Pre_Edit.csv';
DATA _NULL_;
RETAIN Carriage_Return '0D'x;
RETAIN Line_Feed '0A'x;
RETAIN Semi_Colon ';';
%IF %BQUOTE(&Cmnt) = %BQUOTE() %THEN
%DO;
IF _N_ > &Limit THEN
DO;
PUTLOG "NOTE- ";
PUTLOG "NOTE: Stopping " _N_=;
STOP;
END;
%END;
INFILE Elig_In
RECFM = N
;
FILE Elig_Out
LRECL = &LRECL
RECFM = N
;
ARRAY Chars [5] $1 _TEMPORARY_;
INPUT Char $1.;
LINK Load_Array;
LINK Check_Chars;
******;
RETURN;
******;
**********;
Load_Array:
**********;
Chars[5] = Chars[4];
Chars[4] = Chars[3];
Chars[3] = Chars[2];
Chars[2] = Chars[1];
Chars[1] = UPCASE(Char);
******;
RETURN;
******;
***********;
Check_Chars:
***********;
IF Chars[5] = 'V' AND
Chars[4] = 'I' AND
Chars[3] = 'D' AND
Chars[2] = 'E' AND
Chars[1] = Line_Feed THEN
DO;
&Cmnt PUTLOG "NOTE- *** VIDE and Line feed detected *** " _N_= Chars[5]= Chars[4]= Chars[3]= Chars[2]= Chars[1]= Char=;
PUT Carriage_Return $1.;
PUT Line_Feed $1.;
END;
ELSE
DO;
IF Char = Line_Feed THEN
DO;
&Cmnt PUTLOG "NOTE- *** Stand alone line feed detected *** " _N_= Char=;
PUT Semi_Colon $1.;
END;
ELSE
IF Char = Carriage_Return THEN
DO;
&Cmnt PUTLOG "NOTE- *** Stand alone carriage return detected *** " _N_= Char=;
PUT Semi_Colon $1.;
END;
ELSE
DO;
&Cmnt PUTLOG "NOTE- Other character detected " _N_= Char=;
PUT Char $1.;
END;
END;
******;
RETURN;
******;
RUN;
After you run the above new SAS program, you can run the import program with no other steps in between.
This new SAS program will not take care of record 9 which has LF's that are not separators and are not terminators. It may be possible to remove these, but that will take extra programming, and I need to get back to my regular job. Really, if the data is this bad, the person who creates the data should fix the data. If the data cannot be fixed, then the company in charge or reading the data needs to hire a consultant or contractor who has the specialized programming skill needed to work with this kind of data. I would be happy to do it, but I cannot work for free always. Usually, I try to give advice or tips so that the person posting to the SAS Community can then continue on their own, but here I am 100% writing the entire program which is not exactly the spirit of the SAS Community as I understand it. The SAS Community is here to assist but not to be free labor. I hope you understand.
If you have any problems with the program, please let me know, and I will he happy to assist. If the programs serve you, please mark one of my posts as the solution so that I may at least have a little recognition for my work. I would appreciate that.
Jim
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.