Dear community members;
I have a dataset in which i have a some duplicate obs on ID variable
I want to find the obs and var combination for which there is a unque value
like A3XXX (ie OBS2 and VAR2) and B6XXXX
ID | VAR2 | VAR3 | VAR4 | VAR5 | |
OBS1 | A1 | A2 | A3 | A4 | A5 |
OBS2 | A1 | A3XXXX | A3 | A4 | A5 |
OBS3 | B1 | B2 | B3 | B4 | B5 |
OBS4 | B1 | B2 | B3 | B4 | B6XXX |
OBS5 | C1 | C2 | C3 | C4 | C5 |
OBS6 | C1 | C2 | C3 | C4 | C5 |
OBS7 | D1 | D2 | D3 | D4 | D5 |
OBS8 | D1 | D2 | D3 | D4 | D5 |
OBS9 | E1 | E2 | E3 | E4 | E5 |
OBS10 | E1 | E2 | E3 | E4 | E5 |
how should i approach this problem any suggestion?
thanks a ton in advance 🙂
Here is a way to spot non uniform replicates:
data have;
length i ID VAR2 VAR3 VAR4 VAR5 $8;
input i ID VAR2 VAR3 VAR4 VAR5;
datalines;
OBS1 A1 A2 A3 A4 A5
OBS2 A1 A3XXXX A3 A4 A5
OBS3 B1 B2 B3 B4 B5
OBS4 B1 B2 B3 B4 B6XXX
OBS5 C1 C2 C3 C4 C5
OBS6 C1 C2 C3 C4 C5
OBS7 D1 D2 D3 D4 D5
OBS8 D1 D2 D3 D4 D5
OBS9 E1 E2 E3 E4 E5
OBS10 E1 E2 E3 E4 E5
;
proc sql;
create table dups as
select *
from have
group by ID
having count(distinct catx("@", VAR2, VAR3, VAR4, VAR5)) > 1
order by i;
quit;
proc print data=dups; by id; id id; run;
Please elaborate and share what do you want in output dataset.
Hi Rahul the output I want is like mentioned here
VAR1 | VAR2 | VAR3 | VAR4 | VAR5 | |
OBS1 | A2 | ||||
OBS2 | A3XXXX | ||||
OBS3 | B5 | ||||
OBS4 | B6XXX |
i
The same idea as before, but nulling out identical entries:
data have;
length i ID VAR2 VAR3 VAR4 VAR5 $8;
input i ID VAR2 VAR3 VAR4 VAR5;
datalines;
OBS1 A1 A2 A3 A4 A5
OBS2 A1 A3XXXX A3 A4 A5
OBS3 B1 B2 B3 B4 B5
OBS4 B1 B2 B3 B4 B6XXX
OBS5 C1 C2 C3 C4 C5
OBS6 C1 C2 C3 C4 C5
OBS7 D1 D2 D3 D4 D5
OBS8 D1 D2 D3 D4 D5
OBS9 E1 E2 E3 E4 E5
OBS10 E1 E2 E3 E4 E5
;
proc sql;
create table dups as
select *,
count(distinct VAR2) as nVar2,
count(distinct VAR3) as nVar3,
count(distinct VAR4) as nVar4,
count(distinct VAR5) as nVar5
from have
group by ID
having max(nVar2, nVar3, nVar4, nVar5) > 1
order by i;
quit;
data nulDups;
set dups;
array _v var2-var5;
array _n nVar2-nVar5;
do j = 1 to dim(_v);
if _n{j} = 1 then call missing(_v{j});
end;
drop j nVar:;
run;
proc print data=nulDups; by id; id id; run;
Like this?
proc sql;
select ID
,ifc(count(unique VAR2) =1,' ', VAR2) as VAR2
,ifc(count(unique VAR3) =1,' ', VAR3) as VAR3
,ifc(count(unique VAR4) =1,' ', VAR4) as VAR4
,ifc(count(unique VAR5) =1,' ', VAR5) as VAR5
from HAVE
group by ID
having lengthn(cats(VAR2,VAR3,VAR4,VAR5));
quit;
ID | VAR2 | VAR3 | VAR4 | VAR5 |
---|---|---|---|---|
A1 | A3XXXX | |||
A1 | A2 | |||
B1 | B5 | |||
B1 | B6XXX |
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.