How to deal with duplicates?

Reply
Occasional Contributor DrJ
Occasional Contributor
Posts: 17

How to deal with duplicates?

After averaging observations the result is one less observation than before. When I merge this data-set with the original that contains other variables that I wanted to include in my analysis--- I end up getting a lot of duplication of results. I am working on SAS Enterprise Guide 4.3. How can I be assured that these duplicates will not hinder my further analysis--- Is it possible to get rid of these duplicates after merging without compromising my dataset?

Super Contributor
Posts: 333

Re: How to deal with duplicates?

We probably need a little more to go on like the type of merge you are performing and structure of the data to help. Perhaps even how the averaging was done in the previous step may be needed.

EJ

Occasional Contributor DrJ
Occasional Contributor
Posts: 17

Re: How to deal with duplicates?

I used the query builder to merge the data sets.  My data set contains observations derived from an assay and are grouped my uid, assay and visitnumber.  The averaging was done using the below code

Proc Summary data= Work.filter_query_for_v065_UID_SA nway;

where repeat >0;

class uid visino runorkit;

var detail3

output out=detail3summary mean=detail3mean

run;

Occasional Contributor DrJ
Occasional Contributor
Posts: 17

Re: How to deal with duplicates?

Help would definitely appreciated. Please contact via messager if need be. These duplicates only occur when I merge via SAS Enterprise Guide

Super Contributor
Posts: 333

Re: How to deal with duplicates?

What do you mean by the last statement

These duplicates only occur when I merge via SAS Enterprise Guide

What other ways have you tried to do the merge?

What type of join and what variables did you join on in the query builder?

Occasional Contributor DrJ
Occasional Contributor
Posts: 17

Re: How to deal with duplicates?

The only merge I have tried to attempt was via SAS enterprise guide via the Query Builder. This is the code I obtained,

PROC SQL;

     create table work. detail3summary  AS

     SELECT DISTINCT t1.uid,

     t1.visitno,

     t1.detail3mean,

     t2.uid AS uid1

     t2.visitno AS visitno1.

     t2.detail1mean

     From WORK.FILTER_FOR_DETAIL3SUMMARY t1 INNER JOIN WORK.QUERY_FOR_DETAIL1SUMMARY t2 ON (t1.uid = t2.uid);

     QUIT;

Occasional Contributor DrJ
Occasional Contributor
Posts: 17

Re: How to deal with duplicates?

There are a lot of variables to include but perhaps I should just stick the very important ones... I just do not understand how I can get rid of the duplicates within SAS enterprise while still keeping them group with the correct assay, individual, visit number and treatment arm.

Super Contributor
Posts: 333

Re: How to deal with duplicates?

Looks to me like you did you means via three variables uid visino runorkit but then only joining the tables on uid.

You will have multiple uids since your data set from proc summary will have duplicates by uids.

Looks like you are perhaps merging two summaries together so it depends on how both are structured... if both are using the same class variables then you want to merge on those class variables (and select all the class variable in the merge) to get the correct merge. You will still have duplicates but they will be correctly merged.

Not sure what is appropriate in your situation but I would start looking at the two summary datasets together to see where the issue is.

EJ

Occasional Contributor DrJ
Occasional Contributor
Posts: 17

Re: How to deal with duplicates?

I think you are right on. I think there will be duplicates regardless even when I correctly merge the datasets. The summary dataset  is fine I believe--- I suppose when I merge two summary data sets and than attempt to merge with the original there are problems. So you are suggesting that I only merge with the class variables found in all the datasets.  Is there a way to run analysis taking account these duplicates via SAS Enterprise Guide 4.2. Thanks!

Super User
Posts: 10,466

Re: How to deal with duplicates?

You may only need to create one summary data set and that might simplify things unless the grouping variables are different.

Proc Summary data= Work.filter_query_for_v065_UID_SA nway;

where repeat >0;

class uid visino runorkit;

var detail2 detail3

output out=detailsummary mean(detail2 detail3)=( detail2mean detail3mean);

run;

Super Contributor
Posts: 333

Re: How to deal with duplicates?

If the object is to get means at a uid level then you might have to transpose the output from summary to get a variable for each type of mean then join that to your original data set.

You are asking quite broad questions so it gets hard to answer, but in general most of the SAS procedures can account for groupings but the how may be dependent on the specifics of the analysis.

If you have specific analytic procedure question it may be better to start a new thread on the specific you are trying to do.

EJ

Ask a Question
Discussion stats
  • 10 replies
  • 307 views
  • 3 likes
  • 3 in conversation