BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
danhopkinslewis
Fluorite | Level 6
Hey there,

Novice to SAS so bear with me.

I have a large data set where I am trying to create a new data set of duplicates. I am using postcode as the duplicate but then I want to bring in if the date of birth field also matches on the row.

I tried a proc sort nodupkey but I think it’s just give me a clean list and then a duplicates list. I just want to see the ones which have duplicated on both criteria in a data set.

Any help would be greatly appreciated.

Thanks
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One way that might get you started:

Proc freq data=have noprint;
  tables var1*var2 / out=want(where=(count>1) drop=percent) ;
run;

The noprint option suppresses the normal proc freq output. The out= creates a data set named Want with the levels of the variable combinations and the count, and only when the count is more than 1, i.e. duplicates.

If you want to treat a missing value as a valid combination add the keyword MISSING after the / .

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

A small (but realistic) example would be helpful. Please follow these instructions when providing data. Do not provide data as screen captures or as attachments. More context about the problem would also be helpful.

--
Paige Miller
danhopkinslewis
Fluorite | Level 6

Sorry, unable to submit data as its sensitive. 

PaigeMiller
Diamond | Level 26

@danhopkinslewis wrote:

Sorry, unable to submit data as its sensitive. 


Make up some data, as long as it represents the problem.

--
Paige Miller
Kurt_Bremser
Super User

Then make up some fake data that illustrates your issue. Just make sure that variable types and other attributes (length, format) are the same.

ballardw
Super User

One way that might get you started:

Proc freq data=have noprint;
  tables var1*var2 / out=want(where=(count>1) drop=percent) ;
run;

The noprint option suppresses the normal proc freq output. The out= creates a data set named Want with the levels of the variable combinations and the count, and only when the count is more than 1, i.e. duplicates.

If you want to treat a missing value as a valid combination add the keyword MISSING after the / .

danhopkinslewis
Fluorite | Level 6

Thanks. So if I wanted to bring in all the other variable columns but not include them in the count, how would I do that? 

ballardw
Super User

@danhopkinslewis wrote:

Thanks. So if I wanted to bring in all the other variable columns but not include them in the count, how would I do that? 


Do ALL of the other variables where you have duplicates have the same values?

If not you will need to decide which ones you want.

 

If the idea is to add a code that the variable combination is a duplicate then you could merge this back on the original data.

Again pseudo code because you haven't mentioned names of data sets or variables: The below renames the count to indicate that it is indeed a duplicate count.

Proc sort data=have;
   by var1 var2;
run;

data final;
   merge have want(  rename=(count=dupecount) );
by var1 var2; run;

If you want a simple flag for "this record is part of a duplicate set" you could add something like:

Flag = (dupecount>1);

 

SAS will treat the result of logical comparisons as a numeric one when true and zero when false.

 

Other logic could be used depending on what you want.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1253 views
  • 1 like
  • 4 in conversation