BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rj438
Calcite | Level 5
 
 

 

I have a dataset that looks like this 

 

ID V1 V2 V3

A  12 13 14

A 12 14 14

A 12 15 14

 

I want it to look like:

ID V1 V2 V3

A    .   13   . 

A    .   14   .

A    .   15   .

Basically, replace those with ID = A, and variables with duplicate values as missing. 

 

if lag(ID)=ID and lag(V1)=V1 then call missing(V1);
if lag(ID)=ID and lag(V2)=V2 then call missing(V2);
if lag(ID)=ID and lag(V3)=V3 then call missing(V3);

I used this code but it only replaces subsequent duplicates, not all! 

Thanks a lot for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have;
input ID $ V1 V2 V3;
cards;
A  12 13 14
A 12 14 14
A 12 15 14
;


proc means data=have noprint;
by id;
var v:;
output out=w(drop=_:) std=/autoname;
run;
data want;
merge have w;
by id;
array s1 v1-v3;
array s2 v1_stddev--v3_stddev;
do _n_=1 to dim(s1);
if s2(_n_)=0 then call missing(s1(_n_));
end;
keep id v1-v3;
run;

View solution in original post

17 REPLIES 17
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, it doesn't make much sense, but I would suggest flipping the data over:

data have;
  input id $ v1 v2 v3;
datalines;
A 12 13 14
A 12 14 14
A 12 15 14
;
run;
proc transpose data=have out=inter;
  by id;
  var v:;
run;
data inter;
  set inter;
  array i col:;
  if sum(of i{*})/dim(i)=i{1} then call missing(of col:);
run;
proc transpose data=inter out=want;
  by id;
  var col:;
run;

 

rj438
Calcite | Level 5

@ChrisNZ , i took the solution you posted from a different question, but wondering if there is any way to delete the original of the duplicates, and not just the subsequent ones!? 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Nothing about this post makes sense?

Ksharp
Super User

Sorry. It is to OP.  EDITED.

 

data have;
  input id $ v1 v2 v3;
datalines;
A  12 13 14
A 15 14 14
A 12 15 14
;
run;
proc sort data=have;
by id v1;
run;
data have;
 set have;
 by id v1;
 if not (first.v1 and last.v1) then call missing(v1);
run;

proc sort data=have;
by id v2;
run;
data have;
 set have;
 by id v2;
 if not (first.v2 and last.v2) then call missing(v2);
run;

proc sort data=have;
by id v3;
run;
data have;
 set have;
 by id v3;
 if not (first.v3 and last.v3) then call missing(v3);
run;

proc print;run;
rj438
Calcite | Level 5

This only returns one observation per ID when I print the temp file. Am I missing something?

Ksharp
Super User

Code is updated .

novinosrin
Tourmaline | Level 20
data have;
input ID $ V1 V2 V3;
cards;
A  12 13 14
A 12 14 14
A 12 15 14
;


proc means data=have noprint;
by id;
var v:;
output out=w(drop=_:) std=/autoname;
run;
data want;
merge have w;
by id;
array s1 v1-v3;
array s2 v1_stddev--v3_stddev;
do _n_=1 to dim(s1);
if s2(_n_)=0 then call missing(s1(_n_));
end;
keep id v1-v3;
run;
rj438
Calcite | Level 5
I think this was the quickest (albeit twisted) solution to my problem, thanks @novinosrin!
Astounding
PROC Star

What about this data?  What would you like the result to e for V1?

 

ID V1 V2 V3

12 13 14

A 15 14 14

A 12 15 14

rj438
Calcite | Level 5
Anywhere that it's a duplicate should be deemed missing.

ID V1 V2 V3
A . 13 .
A 15 14 .
A . 15 .
Thanks!
Astounding
PROC Star

Let me give you a solution that is straightforward and reliable, but tedious and time-consuming.  We will see if anyone comes up with a better solution.  (Hope somebody does!)

 

Process each variable separately:

 

proc sort data=have;

by id v1;

run;

 

data temp;

set have;

by id v1;

if first.v1=0 or last.v1=0 then v1=.;

run;

 

proc sort data=temp;

by id v2;

run;

 

data temp;

set temp;

by id v2;

if first.v2=0 or last.v2=0 then v2=.;

run;

 

proc sort data=temp;

by id v3;

run

 

data want;

set temp;

by id v3;

if first.v3=0 or last.v3=0 then v3=.;

run;

Reeza
Super User

Why not just remove them from the data set?

rj438
Calcite | Level 5
Because I want to keep the datapoints (either by obs or var) that are not duplicates. For instance, all the V2 datapoints for ID=A in this example.
Andygray
Quartz | Level 8

Do you just have 3 vars v1-v3 or that could vary to many?

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 3347 views
  • 0 likes
  • 7 in conversation