BookmarkSubscribeRSS Feed
vanniekerkm2
Fluorite | Level 6

I have a data set than consists of testday milk, butterfat (%) and protein (%) yield records in dairy cattle.  To clarify, my data (variables)look as follows:

 

animal          days_in_milk     milk_yield     butterfat_%     protein_%

1234567       35                        30.8               3.51                  3.11

1234567       65                        39.2               3.32                  3.09

1234567       95                        38.5               3.21                  3.02

1234567       125                      32.7               3.15                  3.06

1234567       125                      32.7               3.13                  3.06

1234567       155                      30.2               3.05                  3.10

1234567       185                      28.2               3.08                  3.12

 

Where, animal is a unique id number of the cow, days_in_milk (dim) was calcated from test date minus calving date, milk_yield is the yield on the applicable test day, butterfat_% is the percentage of butterfat of the milk sample on the applicable test day, protein_% is the protein percentage of the milk sample on the applicable test day.

 

Marked in red is my problem.  In my data a cow have muliple records signifying records on different test dates (hence, days in milk or dim).  But there are mistakes, as marked in red.  Clicking on "select distinct rows only" when using the query builder does not solve the problem, because it sees the two records marked in red as different (while it is actually a duplicate), because only the butterfat % differs slightly (maybe because of a mistake?). 

 

How do I remove ONE of these records using SAS enterprise guide 7.1 (or 6.1)?  Because unfortunately, I am out of my depth with programming.

 

Please help! 🙂

8 REPLIES 8
nketata
Obsidian | Level 7

Could you type the follwing in the programming window ?

 

data mydata2 duplicates; set mydata;

by days_in_milk;

if first.days_in_milk then flag=1; else flag=0;

if flag=1 then output mydata2;

if flag=0 then output dupicates; run;

vanniekerkm2
Fluorite | Level 6

Thank you nketata.  I have been away for a few days, will try it today.  I appreciate your help!

vanniekerkm2
Fluorite | Level 6

Maybe this is a stupid question.  Where you state "mydata2" and "mydata", am I correct in saying that is the name of my data file?  

vanniekerkm2
Fluorite | Level 6

It also tells me that my data file does not exist?  EG saves it as a .sas7bdat file.  Is this the problem?  It looks as if it is looking for a file "WORK.DATA_FILE_NAME.DATA"?  

Patrick
Opal | Level 21

.sas7bdat is the file extension for SAS tables.

vanniekerkm2
Fluorite | Level 6

What does "ERROR: Libref <file_name> is not assigned" mean?  Can anybody help please?

Patrick
Opal | Level 21

http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#p1janotyd8q0qrn1jh...

 

In general: You need to post your code and SAS log when asking such questions and you need also to add a bit of explanation what you're doing. 

ballardw
Super User

If the records are considered distinct because of non-displayed decimal bits it may be that you could round the values to the shown 1 or decimals before trying the distinct rows. Depending on how your are getting the DIM you may need to round that as well.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 6606 views
  • 2 likes
  • 4 in conversation