I have a data set than consists of testday milk, butterfat (%) and protein (%) yield records in dairy cattle. To clarify, my data (variables)look as follows:
animal days_in_milk milk_yield butterfat_% protein_%
1234567 35 30.8 3.51 3.11
1234567 65 39.2 3.32 3.09
1234567 95 38.5 3.21 3.02
1234567 125 32.7 3.15 3.06
1234567 125 32.7 3.13 3.06
1234567 155 30.2 3.05 3.10
1234567 185 28.2 3.08 3.12
Where, animal is a unique id number of the cow, days_in_milk (dim) was calcated from test date minus calving date, milk_yield is the yield on the applicable test day, butterfat_% is the percentage of butterfat of the milk sample on the applicable test day, protein_% is the protein percentage of the milk sample on the applicable test day.
Marked in red is my problem. In my data a cow have muliple records signifying records on different test dates (hence, days in milk or dim). But there are mistakes, as marked in red. Clicking on "select distinct rows only" when using the query builder does not solve the problem, because it sees the two records marked in red as different (while it is actually a duplicate), because only the butterfat % differs slightly (maybe because of a mistake?).
How do I remove ONE of these records using SAS enterprise guide 7.1 (or 6.1)? Because unfortunately, I am out of my depth with programming.
Please help! 🙂
Could you type the follwing in the programming window ?
data mydata2 duplicates; set mydata;
by days_in_milk;
if first.days_in_milk then flag=1; else flag=0;
if flag=1 then output mydata2;
if flag=0 then output dupicates; run;
Thank you nketata. I have been away for a few days, will try it today. I appreciate your help!
Maybe this is a stupid question. Where you state "mydata2" and "mydata", am I correct in saying that is the name of my data file?
It also tells me that my data file does not exist? EG saves it as a .sas7bdat file. Is this the problem? It looks as if it is looking for a file "WORK.DATA_FILE_NAME.DATA"?
.sas7bdat is the file extension for SAS tables.
What does "ERROR: Libref <file_name> is not assigned" mean? Can anybody help please?
In general: You need to post your code and SAS log when asking such questions and you need also to add a bit of explanation what you're doing.
If the records are considered distinct because of non-displayed decimal bits it may be that you could round the values to the shown 1 or decimals before trying the distinct rows. Depending on how your are getting the DIM you may need to round that as well.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.