I want to identify the duplicate %age as per month Starting from Jan until Dec.
I could find the duplicates and put in separate data set by the following code:
proc sort data = pankaj.UM_yield_detail_mm
dupout= Duplicate_NODUPS
nodupkey ;
by batch_id ;
run ;
Basically, I want to have duplicate count and %age of duplicates based on the original data set which contains Variable "Batch_ID". if in case I don't want to create another other data set. Is it possible or I have to create another data set.
Batch_ID |
8809510976 |
8807226488 |
8809932795 |
8810161136 |
8810161136 |
8810172306 |
8809510752 |
8809436613 |
8809510734 |
8809510734 |
8810207111 |
8809510734 |
8810160901 |
8810207111 |
8809724861 |
8807221194 |
8807221200 |
8807221205 |
8807221217 |
8807221217 |
8807221217 |
8807221217 |
8807221200 |
8807221217 |
8807221200 |
8810160901 |
8810160901 |
Can you post an example of what you're expecting please?
I suspect its a double proc freq but not sure.
proc freq data=have noprint;
table batch_id / out=count_id;
run;
proc freq data=count_id noprint;
table count;
run;
@narulap wrote:
I want to identify the duplicate %age as per month Starting from Jan until Dec.
I could find the duplicates and put in separate data set by the following code:
proc sort data = pankaj.UM_yield_detail_mm dupout= Duplicate_NODUPS nodupkey ; by batch_id ; run ;
Basically, I want to have duplicate count and %age of duplicates based on the original data set which contains Variable "Batch_ID". if in case I don't want to create another other data set. Is it possible or I have to create another data set.
Batch_ID 8809510976 8807226488 8809932795 8810161136 8810161136 8810172306 8809510752 8809436613 8809510734 8809510734 8810207111 8809510734 8810160901 8810207111 8809724861 8807221194 8807221200 8807221205 8807221217 8807221217 8807221217 8807221217 8807221200 8807221217 8807221200 8810160901 8810160901
I have a large data set which contains for instance, 200,000 observations in the below format. I need to identify the DUPLICATE BATCH_ID and %age of DUPLICATES.
Suppose, I have 10000 Batch_ID Duplicates out of 200,000. Then I have .5% Duplicate of the Total Batch ID. Then I can calculate the count of duplicates and
BATCH_ID | TIME | SIZE | TYPE | CUSTOMER | ARTICLE |
For Better understanding, I have attached the SS.
The code you have provided. I have already implemented it but it doesn't gives me the Percentage out of 200,000 observations.
Add the percent options in the PROC FREQ - you're looking for 1- (percent of 1's) on the second table.
I wrote the code for %age Duplicates and I am getting the output by MONTH and YEAR which I wanted.
proc sql;
create table duplicates_&p2 as
select (count(batch_id)- count(distinct(batch_id)))/count(batch_id)
as Duplicate format percent8.2,
month(datepart(Ptime_build)) as Month,
year(datepart(Ptime_Build)) as Year
from TABLENAME
where "&Date_Range_Min."dt <= PTIME_BUILD <="&Date_Range_Max"dt and PTIME_BUILD is not missing
group by Month, Year;
quit;
Thanks for your suggestion.
As Posted earlier, I am able to find the %age of duplicates. Now, I use SGPLOT to plot a line graph for the analysis.
Duplicate % on Y Axis and Month on X axis.
I am able to do it for One Variable i.e Month and want to combine Month&Year on X axis.
proc sql;
create table duplicates_11 as
select (count(batch_id)- count(distinct(batch_id)))/count(batch_id)
as Duplicate format percent8.2,
month(datepart(Ptime_build)) as Month,
year(datepart(Ptime_Build)) as Year,
Final_Date = MDY(Month,1,Year) format=MONYY5.
MDY(Month, 1, Year) format=monyy5. as Final_Date
from pankaj.mv_yield_detail_mm
where PTIME_BUILD is not missing
group by Month, Year;
quit;
Am I using the wrong format or the syntax is incorrect?
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.