Hi all,
I have a dataset that looks like this
ID V1 V2 V3
A 12 13 14
A 12 14 14
A 12 15 14
I want it to look like:
ID V1 V2 V3
A . 13 .
A . 14 .
A . 15 .
Basically, replace those with ID = A, and variables with duplicate values as missing.
if lag(ID)=ID and lag(V1)=V1 then call missing(V1); if lag(ID)=ID and lag(V2)=V2 then call missing(V2); if lag(ID)=ID and lag(V3)=V3 then call missing(V3);
I used this code but it only replaces subsequent duplicates, not all!
Thanks a lot for your help!
data have;
input ID $ V1 V2 V3;
cards;
A 12 13 14
A 12 14 14
A 12 15 14
;
proc means data=have noprint;
by id;
var v:;
output out=w(drop=_:) std=/autoname;
run;
data want;
merge have w;
by id;
array s1 v1-v3;
array s2 v1_stddev--v3_stddev;
do _n_=1 to dim(s1);
if s2(_n_)=0 then call missing(s1(_n_));
end;
keep id v1-v3;
run;
Well, it doesn't make much sense, but I would suggest flipping the data over:
data have; input id $ v1 v2 v3; datalines; A 12 13 14 A 12 14 14 A 12 15 14 ; run; proc transpose data=have out=inter; by id; var v:; run; data inter; set inter; array i col:; if sum(of i{*})/dim(i)=i{1} then call missing(of col:); run; proc transpose data=inter out=want; by id; var col:; run;
@ChrisNZ , i took the solution you posted from a different question, but wondering if there is any way to delete the original of the duplicates, and not just the subsequent ones!?
Nothing about this post makes sense?
Sorry. It is to OP. EDITED.
data have;
input id $ v1 v2 v3;
datalines;
A 12 13 14
A 15 14 14
A 12 15 14
;
run;
proc sort data=have;
by id v1;
run;
data have;
set have;
by id v1;
if not (first.v1 and last.v1) then call missing(v1);
run;
proc sort data=have;
by id v2;
run;
data have;
set have;
by id v2;
if not (first.v2 and last.v2) then call missing(v2);
run;
proc sort data=have;
by id v3;
run;
data have;
set have;
by id v3;
if not (first.v3 and last.v3) then call missing(v3);
run;
proc print;run;
This only returns one observation per ID when I print the temp file. Am I missing something?
Code is updated .
data have;
input ID $ V1 V2 V3;
cards;
A 12 13 14
A 12 14 14
A 12 15 14
;
proc means data=have noprint;
by id;
var v:;
output out=w(drop=_:) std=/autoname;
run;
data want;
merge have w;
by id;
array s1 v1-v3;
array s2 v1_stddev--v3_stddev;
do _n_=1 to dim(s1);
if s2(_n_)=0 then call missing(s1(_n_));
end;
keep id v1-v3;
run;
What about this data? What would you like the result to e for V1?
ID V1 V2 V3
A 12 13 14
A 15 14 14
A 12 15 14
Let me give you a solution that is straightforward and reliable, but tedious and time-consuming. We will see if anyone comes up with a better solution. (Hope somebody does!)
Process each variable separately:
proc sort data=have;
by id v1;
run;
data temp;
set have;
by id v1;
if first.v1=0 or last.v1=0 then v1=.;
run;
proc sort data=temp;
by id v2;
run;
data temp;
set temp;
by id v2;
if first.v2=0 or last.v2=0 then v2=.;
run;
proc sort data=temp;
by id v3;
run
data want;
set temp;
by id v3;
if first.v3=0 or last.v3=0 then v3=.;
run;
Why not just remove them from the data set?
Do you just have 3 vars v1-v3 or that could vary to many?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.