- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a data set with initials in the following format: last,firstmiddle (also pictured below)
I am trying to format the "Initials" variable so that the initials are displayed first middle last (FML). For example, the second observation should show DNJ instead of J,DN. I also want to leave the "-" symbol for any missing middle initials ( as seen in observation 1).
Any feedback would be appreciated!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rohana-
Based on what we have actually covered in class, I adapted this from the Cancer RCT code example. It appears to get what we want for the initials (FML). I can't find an actual FML function:
IF MISSING(Initials) = 0 THEN
Inits = CATS( SUBSTR(UPCASE(Initials), 3, 1), /* 3.3 */
SUBSTR(UPCASE(Initials), 4, 1),
SUBSTR(UPCASE(Initials), 1, 1)
);
ELSE Inits = CATS( SUBSTR(UPCASE(Initials), 3, 1), /* 3.3 */
'-',
SUBSTR(UPCASE(Initials), 1, 1)
);
RUN;
Obs SSN City ZipCd Inits StateCd12345
SSN | City | Zip | Inits | |
Note: I don't have all of the code for this dataset done, so it is not sorted by SSN yet. Which means that JGS is not first.
I hope that helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If all of the values with a comma are in a L,FM layout then this would work. If they are actually in a different order you will have to provide something that lets us know what the different order would be and how to know when.
Perhaps this can get you started.
data want; set have; if index(initials,',')>0 then initials =cats(scan(initials,2),scan(initials,1)); run;
I'm sure someone will show a regular expression approach as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I love the SCAN solution for its simplicity and speed. But to ensure you preserve the '-' int the original values (indicating that there is no initial in that position) you need to specify comma as the only delimiter for SCAN. This should do the trick:
data want;
set have;
/* Split Initials into 2 "words" using SCAN with only comma as delimiter */
/* Then concatenate the "words" in the desired order */
initials=cats(scan(initials,-1,','),scan(initials,1,','));
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It can be done with FCMP.
data have;
infile datalines;
length Initials $6;
input Initials $;
datalines;
M,T-
J,DN
R,JJ
G,MG
F,BC
;
run;
proc fcmp outlib=work.functions.fun ;
function InitFLip(string $) $;
length r $4;
r = catt(scan(string,2,','),scan(string,1,','));
return (r);
endsub;
run;
options cmplib=work.functions;
proc format;
value $ flip (default=4) other=[InitFLip()];
run;
data want;
set have;
format Initials $flip.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rohana-
Based on what we have actually covered in class, I adapted this from the Cancer RCT code example. It appears to get what we want for the initials (FML). I can't find an actual FML function:
IF MISSING(Initials) = 0 THEN
Inits = CATS( SUBSTR(UPCASE(Initials), 3, 1), /* 3.3 */
SUBSTR(UPCASE(Initials), 4, 1),
SUBSTR(UPCASE(Initials), 1, 1)
);
ELSE Inits = CATS( SUBSTR(UPCASE(Initials), 3, 1), /* 3.3 */
'-',
SUBSTR(UPCASE(Initials), 1, 1)
);
RUN;
Obs SSN City ZipCd Inits StateCd12345
SSN | City | Zip | Inits | |
Note: I don't have all of the code for this dataset done, so it is not sorted by SSN yet. Which means that JGS is not first.
I hope that helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @CurtisMackWSIPP for providing data in usable form. @RohanaBruker please post data as data step using datalines, not as screenshot, not only to provide something we can actually work with, but to make it easier to read your post, too.
And here is a solution using a regular expression:
data want;
set have;
length fml $ 3;
fml = prxchange('s/([A-Z]),(.+)/$2$1/', 1, trim(Initials));
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Rohana!
You can also use the following to code:
Inits = CATS(SUBSTR(Initials,3,2),SUBSTR(Initials,1,1));
The dashes for missing will remain in place for this dataset.