Solved: List NonDuplicated Name in the dataset within Comma seperation

ybz12003 · Posted 05-27-2021 11:42 AM

Good day Experts,

A

I created a sample dataset "Have." All the name are seperated with comma in the 'Name' column, however the names in the rows are varous. I would like to list all the nonDuplicated Name shown the 'Want' result. Please let me know how to procede. Thank you.

data Have;  
	length Name $100 ; 
	infile datalines delimiter='/'; 
	input Name;  
	datalines;                     
	John,Marry,Jacob, ,/
	,Ana,Marry,/
	Jacob,John,/
	Will,Tom,Micheal/
;   

data Want;  
	length NameNoDup $100 ; 
	infile datalines delimiter='/'; 
	input NameNoDup;  
	datalines;                     
	John/
	Marry/
	Jacob/
	Ana/
	Will/
	Tom/
	Micheal
;

PaigeMiller · Posted 05-27-2021 11:52 AM

data Have;  
	length Name $100 ; 
	infile datalines delimiter='/'; 
	input Name;  
	datalines;                     
	John,Marry,Jacob, ,/
	,Ana,Marry,/
	Jacob,John,/
	Will,Tom,Micheal/
;   

data want;
    set have;
    do i=1 to countw(name,',');
        newname=scan(name,i,',');
        output;
    end;
run;
proc sql;
    create table want2 as select distinct newname
    from want where not missing(newname);
run;

This raises the question, why are you creating data sets in such an unusual format? You would make your life much easier by creating a data set with one name per line, rather than many names per line.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 05-27-2021 11:52 AM

data Have;  
	length Name $100 ; 
	infile datalines delimiter='/'; 
	input Name;  
	datalines;                     
	John,Marry,Jacob, ,/
	,Ana,Marry,/
	Jacob,John,/
	Will,Tom,Micheal/
;   

data want;
    set have;
    do i=1 to countw(name,',');
        newname=scan(name,i,',');
        output;
    end;
run;
proc sql;
    create table want2 as select distinct newname
    from want where not missing(newname);
run;

This raises the question, why are you creating data sets in such an unusual format? You would make your life much easier by creating a data set with one name per line, rather than many names per line.

--
Paige Miller

ballardw · Posted 05-27-2021 11:52 AM

@ybz12003 wrote:

Good day Experts,

A

I created a sample dataset "Have." All the name are seperated with comma in the 'Name' column, however the names in the rows are varous. I would like to list all the nonDuplicated Name shown the 'Want' result. Please let me know how to procede. Thank you.
data Have;  
	length Name $100 ; 
	infile datalines delimiter='/'; 
	input Name;  
	datalines;                     
	John,Marry,Jacob, ,/
	,Ana,Marry,/
	Jacob,John,/
	Will,Tom,Micheal/
;   

data Want;  
	length NameNoDup $100 ; 
	infile datalines delimiter='/'; 
	input NameNoDup;  
	datalines;                     
	John/
	Marry/
	Jacob/
	Ana/
	Will/
	Tom/
	Micheal
;   

If the order isn't particularly critical

data need;
   set have;
   do i=1 to countw(name);
      onename = scan(name,i);
      if not missing(name) then output;
   end;
   drop i name;
run;

proc sql;
   create table want as
   select distinct onename as name
   from need
   ;
quit;

Default output is likely to be alphabetical by name.

Reeza · Posted 05-27-2021 11:57 AM

data Have;  
	length Name $100 ; 
	infile datalines delimiter=',/' ; 
	input Name @@;  
	datalines;                     
	John,Marry,Jacob, ,/
	,Ana,Marry,/
	Jacob,John,/
	Will,Tom,Micheal/
;  

proc sort data=have nodupkey;
where not missing(compress(trim(name), , 's'));
by name;
run;

proc print data=have;
run;

Read them directly into individual values?

novinosrin · Posted 05-27-2021 12:02 PM

data Have;  
	length Name $100 ; 
	infile datalines delimiter='/'; 
	input Name;  
	datalines;                     
	John,Marry,Jacob, ,/
	,Ana,Marry,/
	Jacob,John,/
	Will,Tom,Micheal/
;   

data _null_;
 if _n_=1 then do;
  dcl hash h();
  h.definekey('newname');
  h.definedone();
 end;
 set have end=z;
 do _n_=1 to countw(name,',');
   newname=scan(name,_n_,',');
   if newname>' ' then rc=h.add();
 end;
 if z;
 h.output(dataset:'want');
run;

proc print noobs;run;

ybz12003 · Posted 05-27-2021 02:37 PM

Hi all, I tried all your coding with my actual dataset. For some reason, PaigeMiller's code works better. Thank you so much for all of your time and effort.

List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Registration is open

List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Re: List NonDuplicated Name in the dataset within Comma seperation

Registration is open

SAS Training: Just a Click Away