Hello,
I am importing a csv file into a SAS dataset using the script below
%let csvinputpath=/.../data; Filename mrcsvfil pipe "ls -Art &csvinputpath./*.csv | tail -n 1 "; DATA folderinfo; Infile mrcsvfil truncover; Input infile_name $100.; Filename=scan(infile_name,-1,"/","b"); call symput ('csvFile',strip(Filename)); call symput ('cn_csvFile1',strip(infile_name)); RUN; %put &=csvFile &=cn_csvFile1; libname dest1 "/.../temp/"; data dest1.dataretpolicieslist_%sysfunc(today(),date9.) ; attrib POLICY_NUMBER label="Policy Number" length=$25. format=$char25. informat=$char25. POLICY_NUMBER_SRC label="Policy Number Source" length=$25. format=$char25. informat=$char25. LINE_OF_BUSINESS label="Line of Business" length=$3. format=$char3. informat=$char3. PROVINCE label="Province" length=$10. format=$char10. informat=$char10. LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10. format=$char10. informat=$char10. UNDERWRITING_COMPANY label="Underwriting Company" length=$10. format=$char10. informat=$char10. DISTRIBUTOR_NUMBER label="Distributor Number" length=$10. format=$char10. informat=$char10. DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10. format=$char10. informat=$char10. SOURCE_SYSTEM label="Source System" length=$15. format=$char15. informat=$char15. DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10. format=$char10. informat=$char10. LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10. LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10. CONTRACT_UUID label="Contract UUID" length=$36. format=$char36. informat=$char36. ; infile "&csvinputpath/&csvFile." LRECL=32767 TERMSTR=CR DLM='|' MISSOVER FIRSTOBS=2 DSD; INPUT POLICY_NUMBER : $char25. POLICY_NUMBER_SRC : $char25. LINE_OF_BUSINESS : $char3. PROVINCE : $char10. LEGACY_UNDERWRITING_COMPANY : $char10. UNDERWRITING_COMPANY : $char10. DISTRIBUTOR_NUMBER : $char10. DISTRIBUTOR_NUMBER_SRC : $char10. SOURCE_SYSTEM : $char15. DATASOURCE_BACKEND_REFERENCE : $char10. LAST_TERM_EFFECTIVE_DATE : YYMMDD10. LAST_TERM_EXPIRY_DATE : YYMMDD10. CONTRACT_UUID : $char36. ; run;
Then when I read the file using the script below
%let ActualDateTime=%sysfunc(putn(%sysfunc(datetime()),datetime18.));
%let ActualDateTime=%sysfunc(translate(&ActualDateTime.,'_',':'));
%put "lookuptable_&ActualDateTime..log";
/*proc printto log="/.../logs/lookuptable_&ActualDateTime..log";*/
%let source=/.../temp;
libname source "/.../temp/";
libname dest1 "/.../reports/";
libname dest2 base "/.../data";
filename oscmd pipe "ls &source/ *.sas7bdat|tail -1";
data _null_;
infile oscmd truncover;
input filename $100.;
filename=scan(filename,1,'.');
call symput('dsname',filename);
run;
%put &=dsname;
data lookup1;
format agreement_nbr $char25.;
set source.&dsname. end=last;
policy_number=strip(policy_number);
first_character=substr(policy_number,1,1);
if not last then output;
run;
proc freq data=lookup1;
table first_character;
run;
There is a strange character in front of the policy_number like a dot . (ex: .1111-000000)
Here's the proc freq results
The FREQ Procedure
first_character | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
1505280 | 100.00 | 1505280 | 100.00 |
As I am develloping new code for a new project, I am using a substring function to eliminate it. But it does not means that this dot will be always present.
What's the best way to solve that kind of issue.
Regards,
You can use the $HEX format to see what character you have.
Since you suspect it is the first character that makes it easier. You can use the $HEX2. format with the variable and see the hex code for just the first character.
Since you told SAS to read the file looking for only CR as the end of line markers it is most likely that you are seeing LF characters at the beginning of every line (after the first one which you skipped) if the file was actually created using CRLF as the end of line markers. The hex code for a LF is '0A'x.
Try changing you code to use TERMSTR=CRLF instead.
You can use the $HEX format to see what character you have.
Since you suspect it is the first character that makes it easier. You can use the $HEX2. format with the variable and see the hex code for just the first character.
Since you told SAS to read the file looking for only CR as the end of line markers it is most likely that you are seeing LF characters at the beginning of every line (after the first one which you skipped) if the file was actually created using CRLF as the end of line markers. The hex code for a LF is '0A'x.
Try changing you code to use TERMSTR=CRLF instead.
Figure out what the first character actually is, then use the COMPRESS function to delete it, or use SUBSTR to skip it. You (probably) only need to do this on one observation that has this strange character.
Simple example using other data:
data a;
text='ldjf;';
put text hex.;
run;
results in the log showing
6C646A663B
so you can test to see (in this example) if the first character is HEX 6C
if text=:'6C'x then ...
I think your actual problem is simply that you told SAS to treat the LF as part of the data instead part of the end of line markers by using TERMSTR=CR instead of TERMSTR=CRLF on the infile statement.
But your post has such a collection of many of the common things on my pet peeves list of SAS coding issue that I would like to go through it and explain. It has nothing really to do with your question (or really whether the code will work, there are always lots of different ways to do things in SAS). It is just that these particular things annoy me.
Do not use the old CALL SYMPUT() method instead of the newer CALL SYMPUTX() method unless there is really a need to store leading and/or trailing spaces into the macro variable.
call symputx('csvFile',Filename);
call symputx('cn_csvFile1',infile_name);
Do not use the old MISSOVER option on the INFILE statement instead of the TRUNCOVER option unless it is really important that values at the end of lines that are shorter than the informat asks for be discarded instead of being used as is. Since your data step is using only LIST MODE input the informat will never be longer than the value available, but why risk it?
Why bother to attach FORMATs to character variables? SAS can display character values just fine without them and you do not run the risk of accidentally at some later point after changing the length of the variable ending up with a display format that truncates the value.
The DATETIME format has a bug so you should never use the width of 18 with it. That will just generate the same string as DATETIME16. does. If you want four digit years you need to use at least a width of 19 with DATETIME format.
If you have attached an informat to the variable then there is no need to again state the informat to use in the INPUT statement. This means the INPUT statement can be much simpler. And for this type of program if you define the variables in the order they appear in the text file (like your program does) then the you can even use a positional variable list in the INPUT statement.
INPUT POLICY_NUMBER -- CONTRACT_UUID ;
If you want to remove the leading spaces from a variable just use the LEFT() function. Using STRIP() to also remove the trailing spaces and then immediately storing the value back into the same fixed length variable will just cause the trailing spaces to be added back since SAS pads the values with spaces.
policy_number=left(policy_number);
This is actually a place where your code might be wrong. If the issue was that there was some invisible character in the front of POLICY_NUMBER why did you remove the leading spaces? Isn't it possible that the invisible character was a space?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.