Hi All,
I currently learning SAS and would like to know if anyone is able to help out on the codes based on expected output.
Given samples are the duplicate name in string with some of the second name are truncated. I am not sure how to remove the duplicate name based on the expected output.
Many thanks.
Name:
John Smith John Smith
Jane Foster Jane Foste
Happy Garden Management Corporation Happy Garden Management C
ABC Car Workshop ABC Car Worksho
Expected Output:
John Smith
Jane Foster
Happy Garden Management Corporation
ABC Car Workshop
Hi @abx and welcome to the SAS Support Communities!
@abx wrote:
I currently learning SAS and would like to know if anyone is able to help out on the codes ...
Actually, before we get to the code, we would need a hard and fast rule to be applied to the strings. Otherwise, we might suggest something like this ...
data have;
input name $80.;
cards;
John Smith John Smith
Jane Foster Jane Foste
Happy Garden Management Corporation Happy Garden Management C
ABC Car Workshop ABC Car Worksho
;
data want(keep=name);
set have;
do i=2 to countw(name, ' ');
  call scan(name, i, pos, len, ' ');
  if trim(substr(name, pos)) =: name then do;
    name=substr(name, 1, pos-1);
    leave;
  end;
end;
run;... and then you come up with example strings like "KSU, Manhattan, KS" where you don't want to cut off the abbreviation at the end.
But maybe there is no such problematic case in your data and the code above works for you.
Hi @abx and welcome to the SAS Support Communities!
@abx wrote:
I currently learning SAS and would like to know if anyone is able to help out on the codes ...
Actually, before we get to the code, we would need a hard and fast rule to be applied to the strings. Otherwise, we might suggest something like this ...
data have;
input name $80.;
cards;
John Smith John Smith
Jane Foster Jane Foste
Happy Garden Management Corporation Happy Garden Management C
ABC Car Workshop ABC Car Worksho
;
data want(keep=name);
set have;
do i=2 to countw(name, ' ');
  call scan(name, i, pos, len, ' ');
  if trim(substr(name, pos)) =: name then do;
    name=substr(name, 1, pos-1);
    leave;
  end;
end;
run;... and then you come up with example strings like "KSU, Manhattan, KS" where you don't want to cut off the abbreviation at the end.
But maybe there is no such problematic case in your data and the code above works for you.
Hello @abx ,
Leonid Batkhan has several interesting blogs about string treatment.
Go to https://blogs.sas.com/content/?s=string+leonid
That is : go to https://blogs.sas.com/
and enter "Leonid" and "string" as search terms, then hit ENTER.
I haven't opened it , but this one might be applicable :
Removing repeated characters in SAS strings 
By Leonid Batkhan on SAS Users November 4, 2020
https://blogs.sas.com/content/sgf/2020/11/04/removing-repeated-characters-in-sas-strings/
Koen
@abx This sort of data cleansing tasks become often quickly rather involved as you have also to deal with valid cases like Johnson & Johnson.
I'd wait with it as an exercise until you're solid with the basics.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
