BookmarkSubscribeRSS Feed
makarand
Obsidian | Level 7

Dear community members;

I have a dataset in which i have a some duplicate obs on ID variable 

I want to find the obs and var combination for which there is a unque value

like A3XXX (ie OBS2 and VAR2) and B6XXXX

 

 IDVAR2VAR3VAR4VAR5
OBS1A1A2A3A4A5
OBS2A1A3XXXXA3A4A5
OBS3B1B2B3B4B5
OBS4B1B2B3B4B6XXX
OBS5C1C2C3C4C5
OBS6C1C2C3C4C5
OBS7D1D2D3D4D5
OBS8D1D2D3D4D5
OBS9E1E2E3E4E5
OBS10E1E2E3E4E5

 

 

how should i approach this problem any suggestion?

thanks a ton in advance 🙂

6 REPLIES 6
PGStats
Opal | Level 21

Here is a way to spot non uniform replicates:

 

data have;
length i ID VAR2 VAR3 VAR4 VAR5 $8;
input i ID VAR2 VAR3 VAR4 VAR5;
datalines;
OBS1	A1	A2	A3	A4	A5
OBS2	A1	A3XXXX	A3	A4	A5
OBS3	B1	B2	B3	B4	B5
OBS4	B1	B2	B3	B4	B6XXX
OBS5	C1	C2	C3	C4	C5
OBS6	C1	C2	C3	C4	C5
OBS7	D1	D2	D3	D4	D5
OBS8	D1	D2	D3	D4	D5
OBS9	E1	E2	E3	E4	E5
OBS10	E1	E2	E3	E4	E5
;

proc sql;
create table dups as 
select * 
from have
group by ID
having count(distinct catx("@", VAR2, VAR3, VAR4, VAR5)) > 1
order by i;
quit;

proc print data=dups; by id; id id; run;
PG
RahulG
Barite | Level 11

Please elaborate and share what do you want in output dataset.

makarand
Obsidian | Level 7

Hi Rahul the output I want is like mentioned here

 

 VAR1VAR2VAR3VAR4VAR5
OBS1 A2   
OBS2 A3XXXX   
OBS3    B5
OBS4    B6XXX

 

i

makarand
Obsidian | Level 7
or even as PGStat replied grouping is also good
PGStats
Opal | Level 21

The same idea as before, but nulling out identical entries:

 

data have;
length i ID VAR2 VAR3 VAR4 VAR5 $8;
input i ID VAR2 VAR3 VAR4 VAR5;
datalines;
OBS1	A1	A2	A3	A4	A5
OBS2	A1	A3XXXX	A3	A4	A5
OBS3	B1	B2	B3	B4	B5
OBS4	B1	B2	B3	B4	B6XXX
OBS5	C1	C2	C3	C4	C5
OBS6	C1	C2	C3	C4	C5
OBS7	D1	D2	D3	D4	D5
OBS8	D1	D2	D3	D4	D5
OBS9	E1	E2	E3	E4	E5
OBS10	E1	E2	E3	E4	E5
;

proc sql;
create table dups as 
select *,
    count(distinct VAR2) as nVar2,
    count(distinct VAR3) as nVar3,
    count(distinct VAR4) as nVar4,
    count(distinct VAR5) as nVar5 
from have
group by ID
having max(nVar2, nVar3, nVar4, nVar5) > 1
order by i;
quit;

data nulDups;
set dups;
array _v var2-var5;
array _n nVar2-nVar5;
do j = 1 to dim(_v);
    if _n{j} = 1 then call missing(_v{j});
    end;
drop j nVar:;
run;

proc print data=nulDups; by id; id id; run;
PG
ChrisNZ
Tourmaline | Level 20

Like this?

 

proc sql;

select ID

,ifc(count(unique VAR2) =1,' ', VAR2) as VAR2

,ifc(count(unique VAR3) =1,' ', VAR3) as VAR3

,ifc(count(unique VAR4) =1,' ', VAR4) as VAR4

,ifc(count(unique VAR5) =1,' ', VAR5) as VAR5

from HAVE

group by ID

having lengthn(cats(VAR2,VAR3,VAR4,VAR5));

quit;

 

ID VAR2 VAR3 VAR4 VAR5
A1 A3XXXX      
A1 A2      
B1       B5
B1       B6XXX

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1219 views
  • 5 likes
  • 4 in conversation