BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
alepage
Barite | Level 11

Hello,

 

I am importing a csv file into a SAS dataset using the script below

 



%let csvinputpath=/.../data;
 
Filename mrcsvfil pipe "ls -Art &csvinputpath./*.csv | tail -n 1 ";
 
DATA folderinfo;
Infile mrcsvfil truncover;
Input 	infile_name $100.;
Filename=scan(infile_name,-1,"/","b");
call symput ('csvFile',strip(Filename));
call symput ('cn_csvFile1',strip(infile_name));
RUN;
%put &=csvFile &=cn_csvFile1;

libname dest1 "/.../temp/";

data dest1.dataretpolicieslist_%sysfunc(today(),date9.) ;
   attrib 
          POLICY_NUMBER					label="Policy Number" 					length=$25. format=$char25.		informat=$char25.
		  POLICY_NUMBER_SRC 	 		label="Policy Number Source" 			length=$25. format=$char25. 	informat=$char25.
		  LINE_OF_BUSINESS				label="Line of Business" 				length=$3.  format=$char3.  	informat=$char3.
		  PROVINCE       				label="Province" 						length=$10. format=$char10. 	informat=$char10.
		  LEGACY_UNDERWRITING_COMPANY 	label="Legacy Underwriting Company" 	length=$10. format=$char10. 	informat=$char10.
		  UNDERWRITING_COMPANY       	label="Underwriting Company" 			length=$10. format=$char10. 	informat=$char10.
		  DISTRIBUTOR_NUMBER			label="Distributor Number" 				length=$10. format=$char10. 	informat=$char10.
		  DISTRIBUTOR_NUMBER_SRC	    label="Distributor Number Source" 		length=$10. format=$char10. 	informat=$char10.
		  SOURCE_SYSTEM 	 			label="Source System"					length=$15. format=$char15. 	informat=$char15.
		  DATASOURCE_BACKEND_REFERENCE	label="Data Source Backend Reference"	length=$10. format=$char10. 	informat=$char10.
		  LAST_TERM_EFFECTIVE_DATE    	label="Last Term Effective Date" 		length=8 	format=YYMMDD10. 	informat=YYMMDD10.
		  LAST_TERM_EXPIRY_DATE 		label="Last Term Expiry Date" 			length=8	format=YYMMDD10. 	informat=YYMMDD10.
		  CONTRACT_UUID					label="Contract UUID" 					length=$36. format=$char36. 	informat=$char36.
   ;

infile "&csvinputpath/&csvFile." LRECL=32767 TERMSTR=CR DLM='|' MISSOVER FIRSTOBS=2 DSD;
INPUT
          POLICY_NUMBER					: $char25.
		  POLICY_NUMBER_SRC 			: $char25.
		  LINE_OF_BUSINESS				: $char3.
		  PROVINCE       				: $char10.
		  LEGACY_UNDERWRITING_COMPANY 	: $char10.
		  UNDERWRITING_COMPANY       	: $char10.
		  DISTRIBUTOR_NUMBER			: $char10.
		  DISTRIBUTOR_NUMBER_SRC		: $char10.
		  SOURCE_SYSTEM 	 			: $char15.
		  DATASOURCE_BACKEND_REFERENCE	: $char10.
		  LAST_TERM_EFFECTIVE_DATE    	: YYMMDD10.
		  LAST_TERM_EXPIRY_DATE 		: YYMMDD10.
		  CONTRACT_UUID					: $char36.
;
run;

Then when I read the file using the script below

 

%let ActualDateTime=%sysfunc(putn(%sysfunc(datetime()),datetime18.));
%let ActualDateTime=%sysfunc(translate(&ActualDateTime.,'_',':'));
%put "lookuptable_&ActualDateTime..log";

/*proc printto log="/.../logs/lookuptable_&ActualDateTime..log";*/

%let source=/.../temp;
libname source "/.../temp/";
libname dest1 "/.../reports/";
libname dest2 base "/.../data";

filename oscmd pipe "ls &source/ *.sas7bdat|tail -1";

data _null_;
infile oscmd truncover;
input filename $100.;
filename=scan(filename,1,'.');
call symput('dsname',filename);
run;
%put &=dsname;

data lookup1;
format agreement_nbr $char25.;
set source.&dsname. end=last;
policy_number=strip(policy_number);
first_character=substr(policy_number,1,1);
if not last then output;
run;
proc freq data=lookup1;
table first_character;
run;

There is a strange character in front of the policy_number like a dot . (ex: .1111-000000)

 

Here's the proc freq results

 

The FREQ Procedure

 
first_character Frequency Percent Cumulative
Frequency
Cumulative
Percent
  1505280 100.00 1505280 100.00

 

As I am develloping new code for a new project, I am using a substring function to eliminate it. But it does not means that this dot will be always present.

What's the best way to solve that kind of issue.

 

Regards,

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You can use the $HEX format to see what character you have.

Since you suspect it is the first character that makes it easier. You can use the $HEX2. format with the variable and see the hex code for just the first character.

 

Since you told SAS to read the file looking for only CR as the end of line markers it is most likely that you are seeing LF characters at the beginning of every line (after the first one which you skipped) if the file was actually created using CRLF as the end of line markers.  The hex code for a LF is '0A'x.

 

Try changing you code to use TERMSTR=CRLF instead.

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

You can use the $HEX format to see what character you have.

Since you suspect it is the first character that makes it easier. You can use the $HEX2. format with the variable and see the hex code for just the first character.

 

Since you told SAS to read the file looking for only CR as the end of line markers it is most likely that you are seeing LF characters at the beginning of every line (after the first one which you skipped) if the file was actually created using CRLF as the end of line markers.  The hex code for a LF is '0A'x.

 

Try changing you code to use TERMSTR=CRLF instead.

PaigeMiller
Diamond | Level 26

Figure out what the first character actually is, then use the COMPRESS function to delete it, or use SUBSTR to skip it. You (probably) only need to do this on one observation that has this strange character.

 

Simple example using other data:

 

data a;
    text='ldjf;';
    put text hex.;
run;

 

results in the log showing

 

6C646A663B

 

so you can test to see (in this example) if the first character is HEX 6C

 

if text=:'6C'x then ...

 

 

--
Paige Miller
Tom
Super User Tom
Super User

I think your actual problem is simply that you told SAS to treat the LF as part of the data instead part of the end of line markers by using TERMSTR=CR instead of TERMSTR=CRLF on the infile statement.

 

But your post has such a collection of many of the common things on my pet peeves list of SAS coding issue that I would like to go through it and explain.  It has nothing really to do with your question (or really whether the code will work, there are always lots of different ways to do things in SAS).  It is just that these particular things annoy me.

 

Do not use the old CALL SYMPUT() method instead of the newer CALL SYMPUTX() method unless there is really a need to store leading and/or trailing spaces into the macro variable.

call symputx('csvFile',Filename);
call symputx('cn_csvFile1',infile_name);

 

Do not use the old MISSOVER option on the INFILE statement instead of the TRUNCOVER option unless it is really important that values at the end of lines that are shorter than the informat asks for be discarded instead of being used as is.  Since your data step is using only LIST MODE input the informat will never be longer than the value available, but why risk it?

 

Why bother to attach FORMATs to character variables?  SAS can display character values just fine without them and you do not run the risk of accidentally at some later point after changing the length of the variable ending up with a display format that truncates the value.

 

The DATETIME format has a bug so you should never use the width of 18 with it.  That will just generate the same string as DATETIME16. does.  If you want four digit years you need to use at least a width of 19 with DATETIME format.

 

If you have attached an informat to the variable then there is no need to again state the informat to use in the INPUT statement.  This means the INPUT statement can be much simpler.  And for this type of program if you define the variables in the order they appear in the text file (like your program does) then the you can even use a positional variable list in the INPUT statement.

INPUT POLICY_NUMBER -- CONTRACT_UUID ;

If you want to remove the leading spaces from a variable just use the LEFT() function.  Using STRIP() to also remove the trailing spaces and then immediately storing the value back into the same fixed length variable will just cause the trailing spaces to be added back since SAS pads the values with spaces.

policy_number=left(policy_number);

This is actually a place where your code might be wrong.  If the issue was that there was some invisible character in the front of POLICY_NUMBER why did you remove the leading spaces?  Isn't it possible that the invisible character was a space?

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 509 views
  • 0 likes
  • 3 in conversation