BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
novinosrin
Tourmaline | Level 20

Hi @Primavera   You are not alone. We all have had those moments too. It's fun to make mistakes.  Do not hesitate to throw open a thread if you need help. That's what keeps the community alive. I ask questions all the time:)

Reeza
Super User
You don't to create the HAVE table, that's your input data set of 300,000 records. And 300K records is trivial for SAS....very, very trivial 🙂
Patrick
Opal | Level 21

Below an approach creating 3 tables.

- One table with duplicate records

- One table will all IDs with duplicate records

- One table with all IDs with no duplicate records

 

data have;
  input ID  (Epi_start  Epi_end) (:date9.)  Epi_Number;
  format Epi_start  Epi_end date9.;
  cards;
1 21-Aug-10 25-Aug-10 1
2 27-Aug-10 10-Sep-10 1
3 13-Sep-10 18-Sep-10 1
3 01-Oct-10 11-Oct-10 2
3 14-Oct-10 19-Oct-10 3
3 15-Dec-12 18-Dec-10 1
4 18-Oct-10 22-Oct-10 1
4 18-Oct-10 22-Oct-10 2
5 19-Oct-10 29-Oct-10 1
5 13-Dec-10 01-Jan-11 2
5 13-Dec-10 01-Jan-11 3
5 16-Feb-11 28-Feb-11 4
6 01-Dec-10 09-Dec-10 1
7 11-Dec-10 17-Dec-10 1
7 15-Oct-11 22-Oct-11 1
7 01-Dec-11 10-Dec-11 2
7 20-Jun-12 29-Jun-12 1
7 29-Jul-12 08-Aug-12 2
7 07-Sep-12 16-Sep-12 3
8 19-Dec-10 25-Dec-10 1
8 19-Dec-10 25-Dec-10 2
8 19-Dec-10 25-Dec-10 3
8 17-Jan-13 25-Jan-13 1
9 01-Mar-11 10-Mar-11 1
9 25-Mar-11 30-Mar-11 2
;

proc sort data=have out=dups NOUNIQUEKEY;
  by id Epi_start  Epi_end;
run;

data noDupIDs DupIDs;
  merge have dups(in=ind keep=id);
  by id;
  if ind then output DupIDs;
  else output noDupIDs;
run;

proc print data=DupIDs;
run;
Primavera
Quartz | Level 8

Very nice. Thanks. I did not know about nouniquekey! Can you please kindly explain the line:

merge have dups(in=ind keep=id);

 Thank you.

Patrick
Opal | Level 21

@Primavera wrote:

Very nice. Thanks. I did not know about nouniquekey! Can you please kindly explain the line:

merge have dups(in=ind keep=id);

 Thank you.


The dups table contains records with duplicate keys {id, Epi_start, Epi_end}. By merging the dups table back with the source table over ID only (by id;) we can determine which ID's have duplicates and which one not (so also the rows which are not duplicates themselves).

We only need the ID from table dups for this merge, we need some indicator per row if a specific row matches over id or not - that's what the IN data set option gives us. It allows us to define a variable (ind) which becomes 1 if a row matches and 0 if it doesn't. 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 34 replies
  • 1869 views
  • 19 likes
  • 4 in conversation