DATA Step, Macro, Functions and more

Understand "||" and trim

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Understand "||" and trim

[ Edited ]

Can someone help me understand why the trim function here adds the "." in the output, whereas without trim the "." is not added? Thanks!

 

data Names;
	infile datalines dlm='@';
	input Name:$32.;
	datalines;
	Harry Micheal Henderson
	Larry Kerry Baldwin
	Jerry m. Stout
	;

data SplitNames;
	set Names;
	length Last $8;
	First = Scan(name,1);
	FirstName = trim(First);
	/* Trim functions doesn't seem to have any effect */
	Middle = Scan(name,2);
	MiddleIntial = upcase(substr(Middle,1,1));
	MiddleIntial = trim(MiddleIntial)||".";
	Last = Scan(name,3);
	label MiddleIntial="Middle Intital";
	drop Middle;
run;

proc print data = SplitNames label noobs;
run;

proc contents data = SplitNames;
run;

Output:

Name	                Last	        First	FirstName	Middle Intital
Harry Micheal Henderson	Henderso	Harry	Harry	        M.
Larry Kerry Baldwin	Baldwin	        Larry	Larry	        K.
Jerry m. Stout	        Stout	        Jerry	Jerry	        M.

But if I use the line:

 

 

MiddleIntial = MiddleIntial||".";

instead of:

MiddleIntial = trim(MiddleIntial)||".";

My output is:

Name	                Last	        First	FirstName	Middle Intital
Harry Micheal Henderson	Henderso	Harry	Harry	        M
Larry Kerry Baldwin	Baldwin	        Larry	Larry	        K
Jerry m. Stout	        Stout	        Jerry	Jerry	        M

Why?

 

 

 

 


Accepted Solutions
Solution
‎12-18-2015 11:06 AM
Super Contributor
Posts: 336

Re: Understand "||" and trim

Because the concatenation operator only puts character variables "next to each other". The resulting variable has the length of all the connected variables. Because the you didn't define the length of your character variables, they all have the same length. Thus, you either have to trim or define a longer variable (which would contain lots of spaces; would be MiddleInitial3 below).

 

data Names;
	infile datalines dlm='@';
	input Name:$32.;
	datalines;
	Harry Micheal Henderson
	Larry Kerry Baldwin
	Jerry m. Stout
	;

data SplitNames;
	set Names;
	length Last $8 MiddleInitial3 $33.;

	First = Scan(name,1);
	FirstName = trim(First);
	/* Trim functions doesn't seem to have any effect */
	Middle = Scan(name,2);
	MiddleIntial = upcase(substr(Middle,1,1));
	MiddleIntial = trim(MiddleIntial)||".";

	MiddleIntial2 = upcase(substr(Middle,1,1));
	MiddleIntial2_Length=Length(MiddleIntial2);
    MiddleIntial2 = MiddleIntial2||".";
	MiddleInitial3=MiddleIntial2||".";

	Last = Scan(name,3);
	label MiddleIntial="Middle Intital";
	drop Middle;
run;

View solution in original post


All Replies
Super User
Posts: 5,256

Re: Understand "||" and trim

Length of your column perhaps?
Data never sleeps
Frequent Contributor
Posts: 108

Re: Understand "||" and trim

I think 'trim' and '||' conflicting 


data SplitNames;
	set Names;
	length Last $8;
	First = Scan(name,1);
	FirstName = trim(First);
	/* Trim functions doesn't seem to have any effect */
	Middle = Scan(name,2);
	MiddleIntial = upcase(substr(Middle,1,1));
	MiddleIntial = trim(MiddleIntial); /* Here it has same effeect  */
	Last = Scan(name,3);
	label MiddleIntial="Middle Intital";
	drop Middle;
run;
Solution
‎12-18-2015 11:06 AM
Super Contributor
Posts: 336

Re: Understand "||" and trim

Because the concatenation operator only puts character variables "next to each other". The resulting variable has the length of all the connected variables. Because the you didn't define the length of your character variables, they all have the same length. Thus, you either have to trim or define a longer variable (which would contain lots of spaces; would be MiddleInitial3 below).

 

data Names;
	infile datalines dlm='@';
	input Name:$32.;
	datalines;
	Harry Micheal Henderson
	Larry Kerry Baldwin
	Jerry m. Stout
	;

data SplitNames;
	set Names;
	length Last $8 MiddleInitial3 $33.;

	First = Scan(name,1);
	FirstName = trim(First);
	/* Trim functions doesn't seem to have any effect */
	Middle = Scan(name,2);
	MiddleIntial = upcase(substr(Middle,1,1));
	MiddleIntial = trim(MiddleIntial)||".";

	MiddleIntial2 = upcase(substr(Middle,1,1));
	MiddleIntial2_Length=Length(MiddleIntial2);
    MiddleIntial2 = MiddleIntial2||".";
	MiddleInitial3=MiddleIntial2||".";

	Last = Scan(name,3);
	label MiddleIntial="Middle Intital";
	drop Middle;
run;
Super User
Super User
Posts: 7,401

Re: Understand "||" and trim

Hi,

 

Just to add, there are functions - cats, catx etc. which can be used to concatenate values without having to trim, or add in extra characters.  Whilst the double pipe is still useful, the functions read easier and take up less space.  Also note, half of that code is redundant and one variable first/firstname is duplicate as will always be the same.  Try:

data Names;
  infile datalines dlm='@';
  input Name:$32.;
datalines;
Harry Micheal Henderson
Larry Kerry Baldwin
Jerry m. Stout
;
run;    /* Note the missing run here, whilst it doesn't matter, it just reads a lot easier */

data splitnames;
  set names;
  length last first middleinitial $50;
  first=scan(name,1);
  middleinitial=cats(upcase(substr(scan(name,2),1,1)),".");
  last=scan(name,3);
run;
Super User
Posts: 5,082

Re: Understand "||" and trim

Here are some of the details about how and why ... perhaps just being more long-winded about what has already been explained by others.

 

Before executing the DATA step, SAS looks through the code and sets up storage space for all the variables that it will be using.  When it sees these statements, it needs to make a decision:

 

First = scan(name, 1);

FirstName = trim(First);

 

It needs to decide how many characters to use to store First and FirstName.  It hasn't seen any of the data values yet ... it has to make the decision before seeing what is in the data.  

 

Since NAME is 32 characters long, it is theoretically possible that FIRST will also be 32 characters long.  To be safe, SAS decides to use 32 characters to store FIRST.

 

Since FIRST is 32 characters long, it is theoretically possible that it contains no blanks.  So it is possible that SAS will need 32 characters to store FIRSTNAME.  Again to be safe, SAS decides to use 32 characters to store FIRSTNAME.

 

Finally, the DATA step executes.  The TRIM function does work.  It removes trailing blanks from FIRST and stores the result in FIRSTNAME.  However, since FIRSTNAME was previously defined as being 32 characters long, SAS adds blanks back on to the end of the value.

Contributor
Posts: 24

Re: Understand "||" and trim

Hello All, Thanks much for your solutions and they all did help!

Super User
Super User
Posts: 6,500

Re: Understand "||" and trim

SAS stores character variables as fixed length.

So if MIDDLEINITIAL is defined as length $32 and it contains the value 'M' what is actually stored is 'M' with 31 trailing blanks.

So when you specify MIDDLEINITIAL||'.' you are generating a string that is 33 characters long. When you store it back into the variable MIDDLEINITIAL there is only room for the first 32 characters so the '.' does not get stored. 

The TRIM() function removes the trailing blanks.  So the result of TRIM(MIDDLEINITIAL)||'.' is a string that is only 2 characters long. There is plenty of room to store that so it gets stored into MIDDLEINITIAL with 30 trailing blanks.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 317 views
  • 1 like
  • 7 in conversation