I have a data set than consists of testday milk, butterfat (%) and protein (%) yield records in dairy cattle. To clarify, my data (variables)look as follows:
animal days_in_milk milk_yield butterfat_% protein_%
1234567 35 30.8 3.51 3.11
1234567 65 39.2 3.32 3.09
1234567 95 38.5 3.21 3.02
1234567 125 32.7 3.15 3.06
1234567 125 32.7 3.13 3.06
1234567 155 30.2 3.05 3.10
1234567 185 28.2 3.08 3.12
Where, animal is a unique id number of the cow, days_in_milk (dim) was calcated from test date minus calving date, milk_yield is the yield on the applicable test day, butterfat_% is the percentage of butterfat of the milk sample on the applicable test day, protein_% is the protein percentage of the milk sample on the applicable test day.
Marked in red is my problem. In my data a cow have muliple records signifying records on different test dates (hence, days in milk or dim). But there are mistakes, as marked in red. Clicking on "select distinct rows only" when using the query builder does not solve the problem, because it sees the two records marked in red as different (while it is actually a duplicate), because only the butterfat % differs slightly (maybe because of a mistake?).
How do I remove ONE of these records using SAS enterprise guide 7.1 (or 6.1)? Because unfortunately, I am out of my depth with programming.
Please help! 🙂
Could you type the follwing in the programming window ?
data mydata2 duplicates; set mydata;
by days_in_milk;
if first.days_in_milk then flag=1; else flag=0;
if flag=1 then output mydata2;
if flag=0 then output dupicates; run;
Thank you nketata. I have been away for a few days, will try it today. I appreciate your help!
Maybe this is a stupid question. Where you state "mydata2" and "mydata", am I correct in saying that is the name of my data file?
It also tells me that my data file does not exist? EG saves it as a .sas7bdat file. Is this the problem? It looks as if it is looking for a file "WORK.DATA_FILE_NAME.DATA"?
.sas7bdat is the file extension for SAS tables.
What does "ERROR: Libref <file_name> is not assigned" mean? Can anybody help please?
In general: You need to post your code and SAS log when asking such questions and you need also to add a bit of explanation what you're doing.
If the records are considered distinct because of non-displayed decimal bits it may be that you could round the values to the shown 1 or decimals before trying the distinct rows. Depending on how your are getting the DIM you may need to round that as well.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.