proc sort data - dupout nodupkey issue

Sean_OConnor · Posted 04-26-2017 11:53 AM

Folks,

I've merged two datasets and have some reason duplicates corrected within it. I would like to know where these duplicates exist. Thus I've used the following piece of code;

proc sort data=accs dupout=dups nodupkey out=correct; by idinternal customernumber idoccurrence;
RUN;

In the accs dataset I've 315,050 observations and in the correct dataset I've 312,326. However, the dups dataset is empty.

So it would appear that I've circa 2,700 duplicate observations but they are no where to be seen?

Could somone shed some light on the this issue, please?

thomp7050 · Posted 04-26-2017 12:16 PM

Try this, to view the total occurrences for each permutation of your variables:

PROC SQL; 
CREATE TABLE TOTALOCCURRENCE AS
SELECT IDINTERNAL, CUSTOMERNUMBER, IDOCCURRENCE, COUNT(*) AS TOTAL FROM ACCS GROUP BY IDINTERNAL, CUSTOMERNUMBER, IDOCCURRENCE;
QUIT;

Then, after viewing, if you would like a dataset of distinct occurrences, you could write:

PROC SQL;
CREATE TABLE ALLDISTINCT AS
SELECT DISTINCT IDINTERNAL, CUSTOMERNUMBER, IDOCCURRENCE FROM ACCS;
QUIT;

Astounding · Posted 04-26-2017 12:19 PM

It looks like NODUPKEY is kicking in, removing duplicates before the DUPOUT= option can examine. Try removing NODUPKEY and see if that resolves the problem.

proc sort data - dupout nodupkey issue

Re: proc sort data - dupout nodupkey issue

Re: proc sort data - dupout nodupkey issue

Registration is open

Call for Content EXTENDED

proc sort data - dupout nodupkey issue

Re: proc sort data - dupout nodupkey issue

Re: proc sort data - dupout nodupkey issue

Registration is open

Call for Content EXTENDED

SAS Training: Just a Click Away