BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Durlov
Obsidian | Level 7

Can someone help me understand why the trim function here adds the "." in the output, whereas without trim the "." is not added? Thanks!

 

data Names;
	infile datalines dlm='@';
	input Name:$32.;
	datalines;
	Harry Micheal Henderson
	Larry Kerry Baldwin
	Jerry m. Stout
	;

data SplitNames;
	set Names;
	length Last $8;
	First = Scan(name,1);
	FirstName = trim(First);
	/* Trim functions doesn't seem to have any effect */
	Middle = Scan(name,2);
	MiddleIntial = upcase(substr(Middle,1,1));
	MiddleIntial = trim(MiddleIntial)||".";
	Last = Scan(name,3);
	label MiddleIntial="Middle Intital";
	drop Middle;
run;

proc print data = SplitNames label noobs;
run;

proc contents data = SplitNames;
run;

Output:

Name	                Last	        First	FirstName	Middle Intital
Harry Micheal Henderson	Henderso	Harry	Harry	        M.
Larry Kerry Baldwin	Baldwin	        Larry	Larry	        K.
Jerry m. Stout	        Stout	        Jerry	Jerry	        M.

But if I use the line:

 

 

MiddleIntial = MiddleIntial||".";

instead of:

MiddleIntial = trim(MiddleIntial)||".";

My output is:

Name	                Last	        First	FirstName	Middle Intital
Harry Micheal Henderson	Henderso	Harry	Harry	        M
Larry Kerry Baldwin	Baldwin	        Larry	Larry	        K
Jerry m. Stout	        Stout	        Jerry	Jerry	        M

Why?

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
user24feb
Barite | Level 11

Because the concatenation operator only puts character variables "next to each other". The resulting variable has the length of all the connected variables. Because the you didn't define the length of your character variables, they all have the same length. Thus, you either have to trim or define a longer variable (which would contain lots of spaces; would be MiddleInitial3 below).

 

data Names;
	infile datalines dlm='@';
	input Name:$32.;
	datalines;
	Harry Micheal Henderson
	Larry Kerry Baldwin
	Jerry m. Stout
	;

data SplitNames;
	set Names;
	length Last $8 MiddleInitial3 $33.;

	First = Scan(name,1);
	FirstName = trim(First);
	/* Trim functions doesn't seem to have any effect */
	Middle = Scan(name,2);
	MiddleIntial = upcase(substr(Middle,1,1));
	MiddleIntial = trim(MiddleIntial)||".";

	MiddleIntial2 = upcase(substr(Middle,1,1));
	MiddleIntial2_Length=Length(MiddleIntial2);
    MiddleIntial2 = MiddleIntial2||".";
	MiddleInitial3=MiddleIntial2||".";

	Last = Scan(name,3);
	label MiddleIntial="Middle Intital";
	drop Middle;
run;

View solution in original post

7 REPLIES 7
LinusH
Tourmaline | Level 20
Length of your column perhaps?
Data never sleeps
pearsoninst
Pyrite | Level 9
I think 'trim' and '||' conflicting 


data SplitNames;
	set Names;
	length Last $8;
	First = Scan(name,1);
	FirstName = trim(First);
	/* Trim functions doesn't seem to have any effect */
	Middle = Scan(name,2);
	MiddleIntial = upcase(substr(Middle,1,1));
	MiddleIntial = trim(MiddleIntial); /* Here it has same effeect  */
	Last = Scan(name,3);
	label MiddleIntial="Middle Intital";
	drop Middle;
run;
user24feb
Barite | Level 11

Because the concatenation operator only puts character variables "next to each other". The resulting variable has the length of all the connected variables. Because the you didn't define the length of your character variables, they all have the same length. Thus, you either have to trim or define a longer variable (which would contain lots of spaces; would be MiddleInitial3 below).

 

data Names;
	infile datalines dlm='@';
	input Name:$32.;
	datalines;
	Harry Micheal Henderson
	Larry Kerry Baldwin
	Jerry m. Stout
	;

data SplitNames;
	set Names;
	length Last $8 MiddleInitial3 $33.;

	First = Scan(name,1);
	FirstName = trim(First);
	/* Trim functions doesn't seem to have any effect */
	Middle = Scan(name,2);
	MiddleIntial = upcase(substr(Middle,1,1));
	MiddleIntial = trim(MiddleIntial)||".";

	MiddleIntial2 = upcase(substr(Middle,1,1));
	MiddleIntial2_Length=Length(MiddleIntial2);
    MiddleIntial2 = MiddleIntial2||".";
	MiddleInitial3=MiddleIntial2||".";

	Last = Scan(name,3);
	label MiddleIntial="Middle Intital";
	drop Middle;
run;
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Just to add, there are functions - cats, catx etc. which can be used to concatenate values without having to trim, or add in extra characters.  Whilst the double pipe is still useful, the functions read easier and take up less space.  Also note, half of that code is redundant and one variable first/firstname is duplicate as will always be the same.  Try:

data Names;
  infile datalines dlm='@';
  input Name:$32.;
datalines;
Harry Micheal Henderson
Larry Kerry Baldwin
Jerry m. Stout
;
run;    /* Note the missing run here, whilst it doesn't matter, it just reads a lot easier */

data splitnames;
  set names;
  length last first middleinitial $50;
  first=scan(name,1);
  middleinitial=cats(upcase(substr(scan(name,2),1,1)),".");
  last=scan(name,3);
run;
Astounding
PROC Star

Here are some of the details about how and why ... perhaps just being more long-winded about what has already been explained by others.

 

Before executing the DATA step, SAS looks through the code and sets up storage space for all the variables that it will be using.  When it sees these statements, it needs to make a decision:

 

First = scan(name, 1);

FirstName = trim(First);

 

It needs to decide how many characters to use to store First and FirstName.  It hasn't seen any of the data values yet ... it has to make the decision before seeing what is in the data.  

 

Since NAME is 32 characters long, it is theoretically possible that FIRST will also be 32 characters long.  To be safe, SAS decides to use 32 characters to store FIRST.

 

Since FIRST is 32 characters long, it is theoretically possible that it contains no blanks.  So it is possible that SAS will need 32 characters to store FIRSTNAME.  Again to be safe, SAS decides to use 32 characters to store FIRSTNAME.

 

Finally, the DATA step executes.  The TRIM function does work.  It removes trailing blanks from FIRST and stores the result in FIRSTNAME.  However, since FIRSTNAME was previously defined as being 32 characters long, SAS adds blanks back on to the end of the value.

Durlov
Obsidian | Level 7

Hello All, Thanks much for your solutions and they all did help!

Tom
Super User Tom
Super User

SAS stores character variables as fixed length.

So if MIDDLEINITIAL is defined as length $32 and it contains the value 'M' what is actually stored is 'M' with 31 trailing blanks.

So when you specify MIDDLEINITIAL||'.' you are generating a string that is 33 characters long. When you store it back into the variable MIDDLEINITIAL there is only room for the first 32 characters so the '.' does not get stored. 

The TRIM() function removes the trailing blanks.  So the result of TRIM(MIDDLEINITIAL)||'.' is a string that is only 2 characters long. There is plenty of room to store that so it gets stored into MIDDLEINITIAL with 30 trailing blanks.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3692 views
  • 1 like
  • 7 in conversation