BookmarkSubscribeRSS Feed
deleted_user
Not applicable
proc sort data=mydataset nodupkey;
by _all_;
run;

Q1: Will there be no identical records in mydataset after execution of the above code; without any exception?

Q2: Is there any procedure to get a report on how many identical records there is for each ID, where ID is a primary key in mydataset? (Yes, of course run before the above code.)
5 REPLIES 5
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
My observation is to say "yes" to Q1. As for Q2, I suggest you review the use of DUPOUT= (with PROC SORT), if you need to do some analysis, whether you need to generate a report - there is no built-in SAS "report" for analyzing with a duplicate-data condition.

Scott Barry
SBBWorks, Inc.
Patrick
Opal | Level 21
For Q2:

proc sql;
select id,count(*) as Duplicates
from mydataset
group by id
having count(*)>1
;
quit;
Doc_Duke
Rhodochrosite | Level 12
Patrick,

Your code will display the cases with duplicate IDs, but does not address duplicate content, which is what I read Q2 to ask.

Doc Muhlbaier
Duke
Patrick
Opal | Level 21
Ernesto

Duke is of course right. The code I provided counts only duplicate ID's.

The code below shows/counts now duplicate records.

What confuses me:
ID is defined as primary key and should therefore be unique.
Just removing all duplicate records won't solve all possible issues with duplicate ID's. There could even be different sets of duplicate records per id as
the sample data illustrates.


data mydataset;
keep id var1-var5;
length id 8;
array var {5} 8;
do id=1 to 2;
do i=1 to 20;
do j=1 to dim(var);
var{j}=round(ranuni(1));
end;
output;
end;
end;
run;

/* variant1: count before duplicates got removed */
proc sql;
select NAME INTO :col_list separated by ','
from dictionary.columns
where libname='WORK' and memname='MYDATASET';
;
select count(*) as Duplicates,*
from mydataset
group by &col_list
having count(*)>1
;
quit;


/* variant2: count after duplicates got removed */
proc sort data=mydataset out=Unique_mydataset equals nodupkey dupout=Dup_mydataset;
by _all_;
run;

proc sql;
select name into :col_list separated by ','
from dictionary.columns
where libname='WORK' and memname='Dup_mydataset';
;
select count(*)+1 as Duplicates,*
from Dup_mydataset
group by &col_list
;
quit;


HTH
Patrick
deleted_user
Not applicable
Correction: ID is not a primary key.

Thank's for all answers.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 897 views
  • 0 likes
  • 4 in conversation