BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rj438
Calcite | Level 5
 
 

 

I have a dataset that looks like this 

 

ID V1 V2 V3

A  12 13 14

A 12 14 14

A 12 15 14

 

I want it to look like:

ID V1 V2 V3

A    .   13   . 

A    .   14   .

A    .   15   .

Basically, replace those with ID = A, and variables with duplicate values as missing. 

 

if lag(ID)=ID and lag(V1)=V1 then call missing(V1);
if lag(ID)=ID and lag(V2)=V2 then call missing(V2);
if lag(ID)=ID and lag(V3)=V3 then call missing(V3);

I used this code but it only replaces subsequent duplicates, not all! 

Thanks a lot for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have;
input ID $ V1 V2 V3;
cards;
A  12 13 14
A 12 14 14
A 12 15 14
;


proc means data=have noprint;
by id;
var v:;
output out=w(drop=_:) std=/autoname;
run;
data want;
merge have w;
by id;
array s1 v1-v3;
array s2 v1_stddev--v3_stddev;
do _n_=1 to dim(s1);
if s2(_n_)=0 then call missing(s1(_n_));
end;
keep id v1-v3;
run;

View solution in original post

17 REPLIES 17
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, it doesn't make much sense, but I would suggest flipping the data over:

data have;
  input id $ v1 v2 v3;
datalines;
A 12 13 14
A 12 14 14
A 12 15 14
;
run;
proc transpose data=have out=inter;
  by id;
  var v:;
run;
data inter;
  set inter;
  array i col:;
  if sum(of i{*})/dim(i)=i{1} then call missing(of col:);
run;
proc transpose data=inter out=want;
  by id;
  var col:;
run;

 

rj438
Calcite | Level 5

@ChrisNZ , i took the solution you posted from a different question, but wondering if there is any way to delete the original of the duplicates, and not just the subsequent ones!? 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Nothing about this post makes sense?

Ksharp
Super User

Sorry. It is to OP.  EDITED.

 

data have;
  input id $ v1 v2 v3;
datalines;
A  12 13 14
A 15 14 14
A 12 15 14
;
run;
proc sort data=have;
by id v1;
run;
data have;
 set have;
 by id v1;
 if not (first.v1 and last.v1) then call missing(v1);
run;

proc sort data=have;
by id v2;
run;
data have;
 set have;
 by id v2;
 if not (first.v2 and last.v2) then call missing(v2);
run;

proc sort data=have;
by id v3;
run;
data have;
 set have;
 by id v3;
 if not (first.v3 and last.v3) then call missing(v3);
run;

proc print;run;
rj438
Calcite | Level 5

This only returns one observation per ID when I print the temp file. Am I missing something?

Ksharp
Super User

Code is updated .

novinosrin
Tourmaline | Level 20
data have;
input ID $ V1 V2 V3;
cards;
A  12 13 14
A 12 14 14
A 12 15 14
;


proc means data=have noprint;
by id;
var v:;
output out=w(drop=_:) std=/autoname;
run;
data want;
merge have w;
by id;
array s1 v1-v3;
array s2 v1_stddev--v3_stddev;
do _n_=1 to dim(s1);
if s2(_n_)=0 then call missing(s1(_n_));
end;
keep id v1-v3;
run;
rj438
Calcite | Level 5
I think this was the quickest (albeit twisted) solution to my problem, thanks @novinosrin!
Astounding
PROC Star

What about this data?  What would you like the result to e for V1?

 

ID V1 V2 V3

12 13 14

A 15 14 14

A 12 15 14

rj438
Calcite | Level 5
Anywhere that it's a duplicate should be deemed missing.

ID V1 V2 V3
A . 13 .
A 15 14 .
A . 15 .
Thanks!
Astounding
PROC Star

Let me give you a solution that is straightforward and reliable, but tedious and time-consuming.  We will see if anyone comes up with a better solution.  (Hope somebody does!)

 

Process each variable separately:

 

proc sort data=have;

by id v1;

run;

 

data temp;

set have;

by id v1;

if first.v1=0 or last.v1=0 then v1=.;

run;

 

proc sort data=temp;

by id v2;

run;

 

data temp;

set temp;

by id v2;

if first.v2=0 or last.v2=0 then v2=.;

run;

 

proc sort data=temp;

by id v3;

run

 

data want;

set temp;

by id v3;

if first.v3=0 or last.v3=0 then v3=.;

run;

Reeza
Super User

Why not just remove them from the data set?

rj438
Calcite | Level 5
Because I want to keep the datapoints (either by obs or var) that are not duplicates. For instance, all the V2 datapoints for ID=A in this example.
Andygray
Quartz | Level 8

Do you just have 3 vars v1-v3 or that could vary to many?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 2952 views
  • 0 likes
  • 7 in conversation