BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jakees
Calcite | Level 5

Hi, I am using a data step to import data. The problem is that I have two lines with similar content in the input file. The only difference is that in the first row the column has no space at the end, in the second row there is a space at the end in the same field. After the import the space is removed from the second row, it is somehow trimmed without me doing this. Anybody knows what to do? Thx in advance!

 

DATA TEST.IMPORTNAME;
	LENGTH
		Fieldname       $ 10
	FORMAT
		Fieldname       $CHAR10.;
	INFORMAT
		Fieldname       $CHAR10.;
	INFILE '/opt/data/Filename.csv'
		LRECL=500
		MISSOVER
		firstobs=2
		DSD;
	INPUT
		Fieldname      : $CHAR10.;
RUN;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

It's not removed at all, there are actually more blanks than the beginner will expect:

DATA WORK.TEST;
	LENGTH
		Fieldname       $ 10
		Fieldname_2     $ 10;
	FORMAT
		Fieldname       $CHAR10.
		Fieldname_2		$CHAR10.;
	INFORMAT
		Fieldname       $CHAR10.
		Fieldname_2		$CHAR10.;
	INFILE '$HOME/sascommunity/testfile.csv'
		LRECL=500
		MISSOVER                DLM=','
		firstobs=2
		DSD;
	INPUT
		Fieldname       : $CHAR10.
		Fieldname_2		: $CHAR10.;
RUN;

proc print data=test noobs;
format fieldname fieldname_2 $hex20.;
run;

I uploaded your csv in binary mode to my home on the server, and ran the above code. This is the result:

Fieldname               Fieldname_2

42343539352020202020    54657374202020202020
42343539352020202020    54657374202020202020

You can see the blanks (hex 20) with which the character variables are padded.

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

A csv file should use commas as delimiter; if that is the case with your file, you need to add dlm=',' in the infile statement.

 

But be aware that character variables in SAS are always padded to their full length with blanks anyway. You could only lose a blank if it was the 11th character.

 


@Jakees wrote:

Hi, I am using a data step to import data. The problem is that I have two lines with similar content in the input file. The only difference is that in the first row the column has no space at the end, in the second row there is a space at the end in the same field. After the import the space is removed from the second row, it is somehow trimmed without me doing this. Anybody knows what to do? Thx in advance!

 

DATA TEST.IMPORTNAME;
	LENGTH
		Fieldname       $ 10
	FORMAT
		Fieldname       $CHAR10.;
	INFORMAT
		Fieldname       $CHAR10.;
	INFILE '/opt/data/Filename.csv'
		LRECL=500
		MISSOVER
		firstobs=2
		DSD;
	INPUT
		Fieldname      : $CHAR10.;
RUN;

 

ChrisNZ
Tourmaline | Level 20

> be aware that character variables in SAS are always padded to their full length with blanks anyway. You could only lose a blank if it was the 11th character.

 

Exactly, so I struggle to imagine what is removed.

Jakees
Calcite | Level 5

Hi thx for your quick response. I added the DLM (without the import was also working). The value B4595 is 6 long (with the space). You can see there is a space in the CSV file. After the import the space is removed in the Work.test table.

 

I attached the file and I use the following code:

 

DATA WORK.TEST;
	LENGTH
		Fieldname       $ 10
		Fieldname_2     $ 10;
	FORMAT
		Fieldname       $CHAR10.
		Fieldname_2		$CHAR10.;
	INFORMAT
		Fieldname       $CHAR10.
		Fieldname_2		$CHAR10.;
	INFILE '/opt/sas/data/testfile.csv'
		LRECL=500
		MISSOVER
DLM=',' firstobs=2 DSD; INPUT Fieldname : $CHAR10. Fieldname_2 : $CHAR10.; RUN;
Kurt_Bremser
Super User

It's not removed at all, there are actually more blanks than the beginner will expect:

DATA WORK.TEST;
	LENGTH
		Fieldname       $ 10
		Fieldname_2     $ 10;
	FORMAT
		Fieldname       $CHAR10.
		Fieldname_2		$CHAR10.;
	INFORMAT
		Fieldname       $CHAR10.
		Fieldname_2		$CHAR10.;
	INFILE '$HOME/sascommunity/testfile.csv'
		LRECL=500
		MISSOVER                DLM=','
		firstobs=2
		DSD;
	INPUT
		Fieldname       : $CHAR10.
		Fieldname_2		: $CHAR10.;
RUN;

proc print data=test noobs;
format fieldname fieldname_2 $hex20.;
run;

I uploaded your csv in binary mode to my home on the server, and ran the above code. This is the result:

Fieldname               Fieldname_2

42343539352020202020    54657374202020202020
42343539352020202020    54657374202020202020

You can see the blanks (hex 20) with which the character variables are padded.

Jakees
Calcite | Level 5

Hi Kurt, the 2 lines have the same hex value which is exactly the problem. Two different values are treated as the same. A value with and without space should have than a different hex value which is not the case. Also when I run a query builder and select distinct rows I get 1 row.

 

The problem is that I receive from a supplier this value as a primary key. I think it cannot be solved by SAS because of the padding. I should ask the supplier not to accept ending spaces in primary keys.

ChrisNZ
Tourmaline | Level 20
As Kurt said strings are padded in SAS.All strings have a fixed length.
Kurt_Bremser
Super User

@Jakees wrote:
I should ask the supplier not to accept ending spaces in primary keys.

Ask them what they are smoking, and where to get it. It seems to be prime stuff, but one should only use it at parties.

 

The reason why I'm being so hard: in all types of environment, blanks are considered a separator between items. In natural written language, they separate words. In programming, they separate code elements. HTML (and all other markup systems, eg the ones used internally in word processors) has the concept of "white space" which can be transparently manipulated as needed by a given layout. Operating systems use blanks as the primary separator, so having them in unexpected places (eg in filenames) forces one to work around that (leading to unnecessary quoting).

Having trailing blanks in a crucial data item only works in environments that store strings in a variable length with a terminating special character (as C does it with hex 00) or a preceding length indicator (see the varchar formats in databases).

 

But I've never seen a variable-length primary key or one with blanks. Primary keys should be fixed length strings without blanks, or UUID's.

ChrisNZ
Tourmaline | Level 20
> I should ask the supplier not to accept ending spaces in primary keys.

Indeed. This is a crazy idea. Kurt's love for spaces in file names is well known. I am sure he is impressed. 😀

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1422 views
  • 2 likes
  • 3 in conversation