Desktop productivity for business analysts and programmers

Finding Duplicate Per Month

Reply
Contributor
Posts: 28

Finding Duplicate Per Month

I want to identify the duplicate %age as per month Starting from Jan until Dec.

 

I could find the duplicates and put in separate data set by the following code:

proc sort data = pankaj.UM_yield_detail_mm 
 dupout= Duplicate_NODUPS
 nodupkey ;
 by batch_id ;
run ;

Basically, I want to have duplicate count and %age of duplicates based on the original data set which contains Variable "Batch_ID". if in case I don't want to create another other data set. Is it possible or I have to create another data set.

batch_ID.PNG

Batch_ID
8809510976
8807226488
8809932795
8810161136
8810161136
8810172306
8809510752
8809436613
8809510734
8809510734
8810207111
8809510734
8810160901
8810207111
8809724861
8807221194
8807221200
8807221205
8807221217
8807221217
8807221217
8807221217
8807221200
8807221217
8807221200
8810160901
8810160901
Moderator
Posts: 310

Re: Finding Duplicate Per Month

Hi - here's a simple head-start to get a count of each batch_id.

 

Question - why the aversion to creating an additional data set?

 

data dsn;
	infile cards;
	input Batch_ID;
cards;
8809510976
8807226488
8809932795
8810161136
8810161136
8810172306
8809510752
8809436613
8809510734
8809510734
8810207111
8809510734
8810160901
8810207111
8809724861
8807221194
8807221200
8807221205
8807221217
8807221217
8807221217
8807221217
8807221200
8807221217
8807221200
8810160901
8810160901
run;
proc means nway;
	class batch_id;
	output out=work.stats;
run;
Contributor
Posts: 28

Re: Finding Duplicate Per Month

Posted in reply to AndrewHowell

Yes, I didn't create a another data set now. I was trying to not to overwrite the results.

But now, its working and I have the %age of duplicates as well.

 

	proc sql;
				create table duplicates_&p2 as
					select (count(batch_id)- count(distinct(batch_id)))/count(batch_id) 
					as 	Duplicate format percent8.2, 	
					month(datepart(Ptime_build)) as Month,
					year(datepart(Ptime_Build)) as Year
					
				
					from TABLE NAME
					where "&Date_Range_Min."dt <= PTIME_BUILD <="&Date_Range_Max"dt and PTIME_BUILD is not missing
					group by Month, Year;
			quit;
Ask a Question
Discussion stats
  • 2 replies
  • 89 views
  • 1 like
  • 2 in conversation