DATA Step, Macro, Functions and more

Removing duplicate values

Accepted Solution Solved
Reply
Contributor
Posts: 71
Accepted Solution

Removing duplicate values

Dear Members,

Once again, I need your expert advice.

I have the following SAS Dataset (posting the dummy data reproduced in excel).

IDA1A2A3A4A5
1234abcdefabcdef 
1235abcabcabcabc 
1236abc    
1240asfabcasfdefasf
1245abcdefasfghijkl
1251asfabc   
1252abcabc   

 

I am looking for the following output where all the duplicate values across rows will be deleted.

 

IDA1A2A3A4A5
1234abcdef   
1235abc    
1236abc    
1240asfabcdef  
1245abcdefasfghijkl
1251asfabc   
1252abc    

 

Thanks in advance !


Accepted Solutions
Solution
‎02-12-2017 08:50 PM
Super User
Super User
Posts: 7,408

Re: Removing duplicate values

Please post test data in the form of a datastep.  See how I have put the test data in the program below.

As an alternative to the datastep you could normalise, sort and transpose:

data have;
  infile datalines missover;
  input ID	A1 $	A2 $	A3 $	A4 $	A5 $;
datalines;
1234	abc	def	abc	def	 
1235	abc	abc	abc	abc	 
1236	abc	 	 	 	 
1240	asf	abc	asf	def	asf
1245	abc	def	asf	ghi	jkl
1251	asf	abc	 	 	 
1252	abc	abc	 
;
run;

proc transpose data=have out=inter;
  by id;
  var a:;
run;

proc sort data=inter nodupkey;
  by id col1;
  where col1 ne "";
run;
 
proc transpose data=inter out=want prefix=a;
  by id;
  var col1;
run;

 

View solution in original post


All Replies
Super Contributor
Posts: 336

Re: Removing duplicate values

For your example:

 

Data A;
  Input ID A1 $ A2 $ A3 $ A4 $ A5 $;
  Datalines; 
1234 abc def abc def .
1235 abc abc abc abc .
1236 abc . . . .
1240 asf abc asf def asf
1245 abc def asf ghi jkl
1251 asf abc . . .
1252 abc abc . . .
; 

Data B1 (Drop=i j);
  Set A;
  Array A{*} A:;
  Do i=1 To Dim(A);
	Do j=i To Dim(A);
	  If (A{j}=A[i}) & (i^=j) Then Call Missing (A{j});
	End;
  End;
Run;
Super Contributor
Posts: 336

Re: Removing duplicate values

.. this one also shifts ..

 

Data B2 (Drop=i j);
  Set A;
  Array A{*} A:;
  Do i=1 To Dim(A);
	Do j=i To Dim(A);
	  If (A{j}=A[i}) & (i^=j) Then Call Missing (A{j});
	End;
  End;
  Do i=1 To Dim(A);
    If Missing (A{i}) Then Do j=i+1 To Dim(A);
	  If not Missing (A{j}) Then Do;
		A{i}=A{j};
		Call Missing (A{j});
	  End;
	End;
  End;
Run;
Solution
‎02-12-2017 08:50 PM
Super User
Super User
Posts: 7,408

Re: Removing duplicate values

Please post test data in the form of a datastep.  See how I have put the test data in the program below.

As an alternative to the datastep you could normalise, sort and transpose:

data have;
  infile datalines missover;
  input ID	A1 $	A2 $	A3 $	A4 $	A5 $;
datalines;
1234	abc	def	abc	def	 
1235	abc	abc	abc	abc	 
1236	abc	 	 	 	 
1240	asf	abc	asf	def	asf
1245	abc	def	asf	ghi	jkl
1251	asf	abc	 	 	 
1252	abc	abc	 
;
run;

proc transpose data=have out=inter;
  by id;
  var a:;
run;

proc sort data=inter nodupkey;
  by id col1;
  where col1 ne "";
run;
 
proc transpose data=inter out=want prefix=a;
  by id;
  var col1;
run;

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 149 views
  • 2 likes
  • 3 in conversation