- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I currently learning SAS and would like to know if anyone is able to help out on the codes based on expected output.
Given samples are the duplicate name in string with some of the second name are truncated. I am not sure how to remove the duplicate name based on the expected output.
Many thanks.
Name:
John Smith John Smith
Jane Foster Jane Foste
Happy Garden Management Corporation Happy Garden Management C
ABC Car Workshop ABC Car Worksho
Expected Output:
John Smith
Jane Foster
Happy Garden Management Corporation
ABC Car Workshop
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @abx and welcome to the SAS Support Communities!
@abx wrote:
I currently learning SAS and would like to know if anyone is able to help out on the codes ...
Actually, before we get to the code, we would need a hard and fast rule to be applied to the strings. Otherwise, we might suggest something like this ...
data have;
input name $80.;
cards;
John Smith John Smith
Jane Foster Jane Foste
Happy Garden Management Corporation Happy Garden Management C
ABC Car Workshop ABC Car Worksho
;
data want(keep=name);
set have;
do i=2 to countw(name, ' ');
call scan(name, i, pos, len, ' ');
if trim(substr(name, pos)) =: name then do;
name=substr(name, 1, pos-1);
leave;
end;
end;
run;
... and then you come up with example strings like "KSU, Manhattan, KS" where you don't want to cut off the abbreviation at the end.
But maybe there is no such problematic case in your data and the code above works for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @abx and welcome to the SAS Support Communities!
@abx wrote:
I currently learning SAS and would like to know if anyone is able to help out on the codes ...
Actually, before we get to the code, we would need a hard and fast rule to be applied to the strings. Otherwise, we might suggest something like this ...
data have;
input name $80.;
cards;
John Smith John Smith
Jane Foster Jane Foste
Happy Garden Management Corporation Happy Garden Management C
ABC Car Workshop ABC Car Worksho
;
data want(keep=name);
set have;
do i=2 to countw(name, ' ');
call scan(name, i, pos, len, ' ');
if trim(substr(name, pos)) =: name then do;
name=substr(name, 1, pos-1);
leave;
end;
end;
run;
... and then you come up with example strings like "KSU, Manhattan, KS" where you don't want to cut off the abbreviation at the end.
But maybe there is no such problematic case in your data and the code above works for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @abx ,
Leonid Batkhan has several interesting blogs about string treatment.
Go to https://blogs.sas.com/content/?s=string+leonid
That is : go to https://blogs.sas.com/
and enter "Leonid" and "string" as search terms, then hit ENTER.
I haven't opened it , but this one might be applicable :
Removing repeated characters in SAS strings
By Leonid Batkhan on SAS Users November 4, 2020
https://blogs.sas.com/content/sgf/2020/11/04/removing-repeated-characters-in-sas-strings/
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@abx This sort of data cleansing tasks become often quickly rather involved as you have also to deal with valid cases like Johnson & Johnson.
I'd wait with it as an exercise until you're solid with the basics.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content