Desktop productivity for business analysts and programmers

Removing duplicate records in ENTERPRISE GUIDE when duplicates are not distinct

Reply
Occasional Contributor
Posts: 8

Removing duplicate records in ENTERPRISE GUIDE when duplicates are not distinct

[ Edited ]

I have a data set than consists of testday milk, butterfat (%) and protein (%) yield records in dairy cattle.  To clarify, my data (variables)look as follows:

 

animal          days_in_milk     milk_yield     butterfat_%     protein_%

1234567       35                        30.8               3.51                  3.11

1234567       65                        39.2               3.32                  3.09

1234567       95                        38.5               3.21                  3.02

1234567       125                      32.7               3.15                  3.06

1234567       125                      32.7               3.13                  3.06

1234567       155                      30.2               3.05                  3.10

1234567       185                      28.2               3.08                  3.12

 

Where, animal is a unique id number of the cow, days_in_milk (dim) was calcated from test date minus calving date, milk_yield is the yield on the applicable test day, butterfat_% is the percentage of butterfat of the milk sample on the applicable test day, protein_% is the protein percentage of the milk sample on the applicable test day.

 

Marked in red is my problem.  In my data a cow have muliple records signifying records on different test dates (hence, days in milk or dim).  But there are mistakes, as marked in red.  Clicking on "select distinct rows only" when using the query builder does not solve the problem, because it sees the two records marked in red as different (while it is actually a duplicate), because only the butterfat % differs slightly (maybe because of a mistake?). 

 

How do I remove ONE of these records using SAS enterprise guide 7.1 (or 6.1)?  Because unfortunately, I am out of my depth with programming.

 

Please help! :-)

Contributor
Posts: 34

Re: Removing duplicate records in ENTERPRISE GUIDE when duplicates are not distinct

[ Edited ]

Could you type the follwing in the programming window ?

 

data mydata2 duplicates; set mydata;

by days_in_milk;

if first.days_in_milk then flag=1; else flag=0;

if flag=1 then output mydata2;

if flag=0 then output dupicates; run;

Occasional Contributor
Posts: 8

Re: Removing duplicate records in ENTERPRISE GUIDE when duplicates are not distinct

Thank you nketata.  I have been away for a few days, will try it today.  I appreciate your help!

Occasional Contributor
Posts: 8

Re: Removing duplicate records in ENTERPRISE GUIDE when duplicates are not distinct

Maybe this is a stupid question.  Where you state "mydata2" and "mydata", am I correct in saying that is the name of my data file?  

Occasional Contributor
Posts: 8

Re: Removing duplicate records in ENTERPRISE GUIDE when duplicates are not distinct

It also tells me that my data file does not exist?  EG saves it as a .sas7bdat file.  Is this the problem?  It looks as if it is looking for a file "WORK.DATA_FILE_NAME.DATA"?  

Respected Advisor
Posts: 4,131

Re: Removing duplicate records in ENTERPRISE GUIDE when duplicates are not distinct

.sas7bdat is the file extension for SAS tables.

Occasional Contributor
Posts: 8

Re: Removing duplicate records in ENTERPRISE GUIDE when duplicates are not distinct

What does "ERROR: Libref <file_name> is not assigned" mean?  Can anybody help please?

Respected Advisor
Posts: 4,131

Re: Removing duplicate records in ENTERPRISE GUIDE when duplicates are not distinct

http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#p1janotyd8q0qrn1jh...

 

In general: You need to post your code and SAS log when asking such questions and you need also to add a bit of explanation what you're doing. 

Super User
Posts: 11,113

Re: Removing duplicate records in ENTERPRISE GUIDE when duplicates are not distinct

If the records are considered distinct because of non-displayed decimal bits it may be that you could round the values to the shown 1 or decimals before trying the distinct rows. Depending on how your are getting the DIM you may need to round that as well.

Ask a Question
Discussion stats
  • 8 replies
  • 160 views
  • 2 likes
  • 4 in conversation