I have a data set with initials in the following format: last,firstmiddle (also pictured below)
I am trying to format the "Initials" variable so that the initials are displayed first middle last (FML). For example, the second observation should show DNJ instead of J,DN. I also want to leave the "-" symbol for any missing middle initials ( as seen in observation 1).
Any feedback would be appreciated!
Hi Rohana-
Based on what we have actually covered in class, I adapted this from the Cancer RCT code example. It appears to get what we want for the initials (FML). I can't find an actual FML function:
IF MISSING(Initials) = 0 THEN
Inits = CATS( SUBSTR(UPCASE(Initials), 3, 1), /* 3.3 */
SUBSTR(UPCASE(Initials), 4, 1),
SUBSTR(UPCASE(Initials), 1, 1)
);
ELSE Inits = CATS( SUBSTR(UPCASE(Initials), 3, 1), /* 3.3 */
'-',
SUBSTR(UPCASE(Initials), 1, 1)
);
RUN;
Obs SSN City ZipCd Inits StateCd12345
SSN | City | Zip | Inits | |
Note: I don't have all of the code for this dataset done, so it is not sorted by SSN yet. Which means that JGS is not first.
I hope that helps!
If all of the values with a comma are in a L,FM layout then this would work. If they are actually in a different order you will have to provide something that lets us know what the different order would be and how to know when.
Perhaps this can get you started.
data want; set have; if index(initials,',')>0 then initials =cats(scan(initials,2),scan(initials,1)); run;
I'm sure someone will show a regular expression approach as well.
I love the SCAN solution for its simplicity and speed. But to ensure you preserve the '-' int the original values (indicating that there is no initial in that position) you need to specify comma as the only delimiter for SCAN. This should do the trick:
data want;
set have;
/* Split Initials into 2 "words" using SCAN with only comma as delimiter */
/* Then concatenate the "words" in the desired order */
initials=cats(scan(initials,-1,','),scan(initials,1,','));
run;
It can be done with FCMP.
data have;
infile datalines;
length Initials $6;
input Initials $;
datalines;
M,T-
J,DN
R,JJ
G,MG
F,BC
;
run;
proc fcmp outlib=work.functions.fun ;
function InitFLip(string $) $;
length r $4;
r = catt(scan(string,2,','),scan(string,1,','));
return (r);
endsub;
run;
options cmplib=work.functions;
proc format;
value $ flip (default=4) other=[InitFLip()];
run;
data want;
set have;
format Initials $flip.;
run;
Hi Rohana-
Based on what we have actually covered in class, I adapted this from the Cancer RCT code example. It appears to get what we want for the initials (FML). I can't find an actual FML function:
IF MISSING(Initials) = 0 THEN
Inits = CATS( SUBSTR(UPCASE(Initials), 3, 1), /* 3.3 */
SUBSTR(UPCASE(Initials), 4, 1),
SUBSTR(UPCASE(Initials), 1, 1)
);
ELSE Inits = CATS( SUBSTR(UPCASE(Initials), 3, 1), /* 3.3 */
'-',
SUBSTR(UPCASE(Initials), 1, 1)
);
RUN;
Obs SSN City ZipCd Inits StateCd12345
SSN | City | Zip | Inits | |
Note: I don't have all of the code for this dataset done, so it is not sorted by SSN yet. Which means that JGS is not first.
I hope that helps!
Thank you!!
Thanks @CurtisMackWSIPP for providing data in usable form. @RohanaBruker please post data as data step using datalines, not as screenshot, not only to provide something we can actually work with, but to make it easier to read your post, too.
And here is a solution using a regular expression:
data want;
set have;
length fml $ 3;
fml = prxchange('s/([A-Z]),(.+)/$2$1/', 1, trim(Initials));
run;
Hello Rohana!
You can also use the following to code:
Inits = CATS(SUBSTR(Initials,3,2),SUBSTR(Initials,1,1));
The dashes for missing will remain in place for this dataset.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.