Can someone help me understand why the trim function here adds the "." in the output, whereas without trim the "." is not added? Thanks!
data Names; infile datalines dlm='@'; input Name:$32.; datalines; Harry Micheal Henderson Larry Kerry Baldwin Jerry m. Stout ; data SplitNames; set Names; length Last $8; First = Scan(name,1); FirstName = trim(First); /* Trim functions doesn't seem to have any effect */ Middle = Scan(name,2); MiddleIntial = upcase(substr(Middle,1,1)); MiddleIntial = trim(MiddleIntial)||"."; Last = Scan(name,3); label MiddleIntial="Middle Intital"; drop Middle; run; proc print data = SplitNames label noobs; run; proc contents data = SplitNames; run;
Output:
Name Last First FirstName Middle Intital Harry Micheal Henderson Henderso Harry Harry M. Larry Kerry Baldwin Baldwin Larry Larry K. Jerry m. Stout Stout Jerry Jerry M.
But if I use the line:
MiddleIntial = MiddleIntial||"."; instead of: MiddleIntial = trim(MiddleIntial)||".";
My output is:
Name Last First FirstName Middle Intital Harry Micheal Henderson Henderso Harry Harry M Larry Kerry Baldwin Baldwin Larry Larry K Jerry m. Stout Stout Jerry Jerry M
Why?
Because the concatenation operator only puts character variables "next to each other". The resulting variable has the length of all the connected variables. Because the you didn't define the length of your character variables, they all have the same length. Thus, you either have to trim or define a longer variable (which would contain lots of spaces; would be MiddleInitial3 below).
data Names;
infile datalines dlm='@';
input Name:$32.;
datalines;
Harry Micheal Henderson
Larry Kerry Baldwin
Jerry m. Stout
;
data SplitNames;
set Names;
length Last $8 MiddleInitial3 $33.;
First = Scan(name,1);
FirstName = trim(First);
/* Trim functions doesn't seem to have any effect */
Middle = Scan(name,2);
MiddleIntial = upcase(substr(Middle,1,1));
MiddleIntial = trim(MiddleIntial)||".";
MiddleIntial2 = upcase(substr(Middle,1,1));
MiddleIntial2_Length=Length(MiddleIntial2);
MiddleIntial2 = MiddleIntial2||".";
MiddleInitial3=MiddleIntial2||".";
Last = Scan(name,3);
label MiddleIntial="Middle Intital";
drop Middle;
run;
I think 'trim' and '||' conflicting data SplitNames; set Names; length Last $8; First = Scan(name,1); FirstName = trim(First); /* Trim functions doesn't seem to have any effect */ Middle = Scan(name,2); MiddleIntial = upcase(substr(Middle,1,1)); MiddleIntial = trim(MiddleIntial); /* Here it has same effeect */ Last = Scan(name,3); label MiddleIntial="Middle Intital"; drop Middle; run;
Because the concatenation operator only puts character variables "next to each other". The resulting variable has the length of all the connected variables. Because the you didn't define the length of your character variables, they all have the same length. Thus, you either have to trim or define a longer variable (which would contain lots of spaces; would be MiddleInitial3 below).
data Names;
infile datalines dlm='@';
input Name:$32.;
datalines;
Harry Micheal Henderson
Larry Kerry Baldwin
Jerry m. Stout
;
data SplitNames;
set Names;
length Last $8 MiddleInitial3 $33.;
First = Scan(name,1);
FirstName = trim(First);
/* Trim functions doesn't seem to have any effect */
Middle = Scan(name,2);
MiddleIntial = upcase(substr(Middle,1,1));
MiddleIntial = trim(MiddleIntial)||".";
MiddleIntial2 = upcase(substr(Middle,1,1));
MiddleIntial2_Length=Length(MiddleIntial2);
MiddleIntial2 = MiddleIntial2||".";
MiddleInitial3=MiddleIntial2||".";
Last = Scan(name,3);
label MiddleIntial="Middle Intital";
drop Middle;
run;
Hi,
Just to add, there are functions - cats, catx etc. which can be used to concatenate values without having to trim, or add in extra characters. Whilst the double pipe is still useful, the functions read easier and take up less space. Also note, half of that code is redundant and one variable first/firstname is duplicate as will always be the same. Try:
data Names; infile datalines dlm='@'; input Name:$32.; datalines; Harry Micheal Henderson Larry Kerry Baldwin Jerry m. Stout ; run; /* Note the missing run here, whilst it doesn't matter, it just reads a lot easier */ data splitnames; set names; length last first middleinitial $50; first=scan(name,1); middleinitial=cats(upcase(substr(scan(name,2),1,1)),"."); last=scan(name,3); run;
Here are some of the details about how and why ... perhaps just being more long-winded about what has already been explained by others.
Before executing the DATA step, SAS looks through the code and sets up storage space for all the variables that it will be using. When it sees these statements, it needs to make a decision:
First = scan(name, 1);
FirstName = trim(First);
It needs to decide how many characters to use to store First and FirstName. It hasn't seen any of the data values yet ... it has to make the decision before seeing what is in the data.
Since NAME is 32 characters long, it is theoretically possible that FIRST will also be 32 characters long. To be safe, SAS decides to use 32 characters to store FIRST.
Since FIRST is 32 characters long, it is theoretically possible that it contains no blanks. So it is possible that SAS will need 32 characters to store FIRSTNAME. Again to be safe, SAS decides to use 32 characters to store FIRSTNAME.
Finally, the DATA step executes. The TRIM function does work. It removes trailing blanks from FIRST and stores the result in FIRSTNAME. However, since FIRSTNAME was previously defined as being 32 characters long, SAS adds blanks back on to the end of the value.
Hello All, Thanks much for your solutions and they all did help!
SAS stores character variables as fixed length.
So if MIDDLEINITIAL is defined as length $32 and it contains the value 'M' what is actually stored is 'M' with 31 trailing blanks.
So when you specify MIDDLEINITIAL||'.' you are generating a string that is 33 characters long. When you store it back into the variable MIDDLEINITIAL there is only room for the first 32 characters so the '.' does not get stored.
The TRIM() function removes the trailing blanks. So the result of TRIM(MIDDLEINITIAL)||'.' is a string that is only 2 characters long. There is plenty of room to store that so it gets stored into MIDDLEINITIAL with 30 trailing blanks.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.