Good day Experts,
A
I created a sample dataset "Have." All the name are seperated with comma in the 'Name' column, however the names in the rows are varous. I would like to list all the nonDuplicated Name shown the 'Want' result. Please let me know how to procede. Thank you.
data Have; length Name $100 ; infile datalines delimiter='/'; input Name; datalines; John,Marry,Jacob, ,/ ,Ana,Marry,/ Jacob,John,/ Will,Tom,Micheal/ ; data Want; length NameNoDup $100 ; infile datalines delimiter='/'; input NameNoDup; datalines; John/ Marry/ Jacob/ Ana/ Will/ Tom/ Micheal ;
data Have;
length Name $100 ;
infile datalines delimiter='/';
input Name;
datalines;
John,Marry,Jacob, ,/
,Ana,Marry,/
Jacob,John,/
Will,Tom,Micheal/
;
data want;
set have;
do i=1 to countw(name,',');
newname=scan(name,i,',');
output;
end;
run;
proc sql;
create table want2 as select distinct newname
from want where not missing(newname);
run;
This raises the question, why are you creating data sets in such an unusual format? You would make your life much easier by creating a data set with one name per line, rather than many names per line.
data Have;
length Name $100 ;
infile datalines delimiter='/';
input Name;
datalines;
John,Marry,Jacob, ,/
,Ana,Marry,/
Jacob,John,/
Will,Tom,Micheal/
;
data want;
set have;
do i=1 to countw(name,',');
newname=scan(name,i,',');
output;
end;
run;
proc sql;
create table want2 as select distinct newname
from want where not missing(newname);
run;
This raises the question, why are you creating data sets in such an unusual format? You would make your life much easier by creating a data set with one name per line, rather than many names per line.
@ybz12003 wrote:
Good day Experts,
A
I created a sample dataset "Have." All the name are seperated with comma in the 'Name' column, however the names in the rows are varous. I would like to list all the nonDuplicated Name shown the 'Want' result. Please let me know how to procede. Thank you.
data Have; length Name $100 ; infile datalines delimiter='/'; input Name; datalines; John,Marry,Jacob, ,/ ,Ana,Marry,/ Jacob,John,/ Will,Tom,Micheal/ ; data Want; length NameNoDup $100 ; infile datalines delimiter='/'; input NameNoDup; datalines; John/ Marry/ Jacob/ Ana/ Will/ Tom/ Micheal ;
If the order isn't particularly critical
data need;
set have;
do i=1 to countw(name);
onename = scan(name,i);
if not missing(name) then output;
end;
drop i name;
run;
proc sql;
create table want as
select distinct onename as name
from need
;
quit;
Default output is likely to be alphabetical by name.
data Have;
length Name $100 ;
infile datalines delimiter=',/' ;
input Name @@;
datalines;
John,Marry,Jacob, ,/
,Ana,Marry,/
Jacob,John,/
Will,Tom,Micheal/
;
proc sort data=have nodupkey;
where not missing(compress(trim(name), , 's'));
by name;
run;
proc print data=have;
run;
Read them directly into individual values?
data Have;
length Name $100 ;
infile datalines delimiter='/';
input Name;
datalines;
John,Marry,Jacob, ,/
,Ana,Marry,/
Jacob,John,/
Will,Tom,Micheal/
;
data _null_;
if _n_=1 then do;
dcl hash h();
h.definekey('newname');
h.definedone();
end;
set have end=z;
do _n_=1 to countw(name,',');
newname=scan(name,_n_,',');
if newname>' ' then rc=h.add();
end;
if z;
h.output(dataset:'want');
run;
proc print noobs;run;
Hi all, I tried all your coding with my actual dataset. For some reason, PaigeMiller's code works better. Thank you so much for all of your time and effort.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.