Solved: Re: Replace ALL duplicate observations with blanks

rj438 · Posted 09-07-2018 09:25 AM

Hi all,

I have a dataset that looks like this

ID V1 V2 V3

A 12 13 14

A 12 14 14

A 12 15 14

I want it to look like:

ID V1 V2 V3

A . 13 .

A . 14 .

A . 15 .

Basically, replace those with ID = A, and variables with duplicate values as missing.

if lag(ID)=ID and lag(V1)=V1 then call missing(V1);
if lag(ID)=ID and lag(V2)=V2 then call missing(V2);
if lag(ID)=ID and lag(V3)=V3 then call missing(V3);

I used this code but it only replaces subsequent duplicates, not all!

Thanks a lot for your help!

novinosrin · Posted 09-07-2018 10:22 AM

data have;
input ID $ V1 V2 V3;
cards;
A  12 13 14
A 12 14 14
A 12 15 14
;


proc means data=have noprint;
by id;
var v:;
output out=w(drop=_:) std=/autoname;
run;
data want;
merge have w;
by id;
array s1 v1-v3;
array s2 v1_stddev--v3_stddev;
do _n_=1 to dim(s1);
if s2(_n_)=0 then call missing(s1(_n_));
end;
keep id v1-v3;
run;

View solution in original post

RW9 · Posted 09-07-2018 09:37 AM

Well, it doesn't make much sense, but I would suggest flipping the data over:

data have;
  input id $ v1 v2 v3;
datalines;
A 12 13 14
A 12 14 14
A 12 15 14
;
run;
proc transpose data=have out=inter;
  by id;
  var v:;
run;
data inter;
  set inter;
  array i col:;
  if sum(of i{*})/dim(i)=i{1} then call missing(of col:);
run;
proc transpose data=inter out=want;
  by id;
  var col:;
run;

rj438 · Posted 09-07-2018 09:54 AM

@ChrisNZ , i took the solution you posted from a different question, but wondering if there is any way to delete the original of the duplicates, and not just the subsequent ones!?

RW9 · Posted 09-07-2018 09:55 AM

Nothing about this post makes sense?

Ksharp · Posted 09-07-2018 11:00 AM

Sorry. It is to OP. EDITED.

data have;
  input id $ v1 v2 v3;
datalines;
A  12 13 14
A 15 14 14
A 12 15 14
;
run;
proc sort data=have;
by id v1;
run;
data have;
 set have;
 by id v1;
 if not (first.v1 and last.v1) then call missing(v1);
run;

proc sort data=have;
by id v2;
run;
data have;
 set have;
 by id v2;
 if not (first.v2 and last.v2) then call missing(v2);
run;

proc sort data=have;
by id v3;
run;
data have;
 set have;
 by id v3;
 if not (first.v3 and last.v3) then call missing(v3);
run;

proc print;run;

rj438 · Posted 09-07-2018 11:09 AM

This only returns one observation per ID when I print the temp file. Am I missing something?

Ksharp · Posted 09-07-2018 11:11 AM

Code is updated .

novinosrin · Posted 09-07-2018 10:22 AM

data have;
input ID $ V1 V2 V3;
cards;
A  12 13 14
A 12 14 14
A 12 15 14
;


proc means data=have noprint;
by id;
var v:;
output out=w(drop=_:) std=/autoname;
run;
data want;
merge have w;
by id;
array s1 v1-v3;
array s2 v1_stddev--v3_stddev;
do _n_=1 to dim(s1);
if s2(_n_)=0 then call missing(s1(_n_));
end;
keep id v1-v3;
run;

rj438 · Posted 09-07-2018 11:50 AM

I think this was the quickest (albeit twisted) solution to my problem, thanks @novinosrin!

Astounding · Posted 09-07-2018 10:34 AM

What about this data? What would you like the result to e for V1?

ID V1 V2 V3

A 12 13 14

A 15 14 14

A 12 15 14

rj438 · Posted 09-07-2018 10:54 AM

Anywhere that it's a duplicate should be deemed missing.

ID V1 V2 V3
A . 13 .
A 15 14 .
A . 15 .
Thanks!

Astounding · Posted 09-07-2018 11:06 AM

Let me give you a solution that is straightforward and reliable, but tedious and time-consuming. We will see if anyone comes up with a better solution. (Hope somebody does!)

Process each variable separately:

proc sort data=have;

by id v1;

run;

data temp;

set have;

by id v1;

if first.v1=0 or last.v1=0 then v1=.;

run;

proc sort data=temp;

by id v2;

run;

data temp;

set temp;

by id v2;

if first.v2=0 or last.v2=0 then v2=.;

run;

proc sort data=temp;

by id v3;

run

data want;

set temp;

by id v3;

if first.v3=0 or last.v3=0 then v3=.;

run;

Reeza · Posted 09-07-2018 10:56 AM

Why not just remove them from the data set?

rj438 · Posted 09-07-2018 11:03 AM

Because I want to keep the datapoints (either by obs or var) that are not duplicates. For instance, all the V2 datapoints for ID=A in this example.

Andygray · Posted 09-07-2018 11:20 AM

Do you just have 3 vars v1-v3 or that could vary to many?

Registration is open