BookmarkSubscribeRSS Feed
juju_p
Obsidian | Level 7

Dear all,

 

I have the below dataset and I would like to make sure that the values (viallab1 to viallab10) are unique within my entire dataset

 

So for example the values in red would be marked as duplicates

 

Can you please give a suggestion on how to solve that?

 

Thank you

 

SUBJIDVISITTVISITDVIALLAB1VIALLAB2VIALLAB3VIALLAB4VIALLAB5VIALLAB6VIALLAB7VIALLAB8VIALLAB9VIALLAB10 
DUM-002-001Cycle 11117003170068172472....... 
DUM-002-001Unscheduled visit during combination treatment 1201.......... 
DUM-002-001Unscheduled visit during combination treatment 2202236547236548236549....... 
CAM-002-001Unscheduled visit during combination treatment 3203.......... 
CAM-002-001Unscheduled visit during combination treatment 4204

117003

.170068...... 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Please clarify ... when you determine if something is a duplicate, are you just looking in one column, or is something a duplicate if it appears once in column VIALLAB1 and it appears again in column VIALLAB2? Okay, I see that the value can be in different columns 

 

When we do find duplicates, then what? Remove them? Or flag them? What action should be taken when duplicates are found?


Can you please provide a portion of your data as SAS data step code (instructions) instead of as a screen capture, which we can't really work with.

--
Paige Miller
Astounding
PROC Star

One approach, hope the result would be satisfactory:

data vials;
   set have;
   array viallab {10};
   do _n_=1 to 10;
      if viallab{_n_} ne . then do;
         vial = viallab{_n_};
         output;
      end;
   end;
   keep vial subjid;
run;

proc freq data=vials noprint;
   tables vial / out = counts;
run;

proc print data=counts;
   where count > 1;
   var vial count;
run;

The report gives you a list of the vials that were recorded more than once, and the data set VIALS tells you which subjects were matched with which vials.

juju_p
Obsidian | Level 7

This prints the duplicate vials but I cannot see on which subjects the duplicate values occur

see below the result


ObsvialCOUNT
7991082912
10671110112
12641129682
13491137942
13621139092
16041161952
Astounding
PROC Star

Looking at the updated description for the output, I would keep the duplicates all in one data set:

data vials;
   set have;
   array viallab {10};
   do whichtest=1 to 10;
      if viallab{whichtest} ne . then do;
         vial = viallab{whichtest};
         output;
      end;
   end;
   keep vial subjid visitt visitd whichtest;
run;

proc sort data=vials;
   by vial subjid;
run;

data want;
   set vials;
   by vial;
   if first.vial=0 or last.vial=0;
run;

/* optionally */
proc print data=want;
   by vial;
   id vial;
   var subjid visitt visitd whichtest;
run;

It's untested code, but should do the trick.

ballardw
Super User

Also, provide an example of what your output data set should look like. If we a flagging values in multiple variables there has to be something that indicates which what is a duplicate and potentially you have have 10 variables per record that can be duplicates, so are you expecting to have 10 additional flag variables added?

 

Transpose the data so you have one record per VIALLAB and it is easy to count those to see which are duplicated anywhere in the set. The fun part will be getting whatever the required output is.

juju_p
Obsidian | Level 7

This is how my output should look like for the example given below

 

So I will have  2 separate tables in my result.

 

SUBJID VISITTVISITDVIALLAB1 
DUM-002-001Cycle 11117003 
CAM-002-001Unscheduled visit during combination treatment 4204117003 
     
SUBJIDVISITTVISITDVIALLAB2VIALLAB3
DUM-002-001Cycle 11170068 
CAM-002-001Unscheduled visit during combination treatment 4204.170068
Kurt_Bremser
Super User
proc transpose
  data=have
  out=long (where=(col1 ne ""))
;
by subjid visitd visitt;
var viallab:;
run;

proc sort
  data=long
  out=nondup
  dupout=dup
  nodupkey
;
by _name_;
run;

Untested; for tested code, please provide data in a data step with datalines, posted in a code box ("little running man" button).

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1578 views
  • 3 likes
  • 5 in conversation