04-02-2017 07:53 AM - edited 04-02-2017 07:54 AM
I have a data set than consists of testday milk, butterfat (%) and protein (%) yield records in dairy cattle. To clarify, my data (variables)look as follows:
animal days_in_milk milk_yield butterfat_% protein_%
1234567 35 30.8 3.51 3.11
1234567 65 39.2 3.32 3.09
1234567 95 38.5 3.21 3.02
1234567 125 32.7 3.15 3.06
1234567 125 32.7 3.13 3.06
1234567 155 30.2 3.05 3.10
1234567 185 28.2 3.08 3.12
Where, animal is a unique id number of the cow, days_in_milk (dim) was calcated from test date minus calving date, milk_yield is the yield on the applicable test day, butterfat_% is the percentage of butterfat of the milk sample on the applicable test day, protein_% is the protein percentage of the milk sample on the applicable test day.
Marked in red is my problem. In my data a cow have muliple records signifying records on different test dates (hence, days in milk or dim). But there are mistakes, as marked in red. Clicking on "select distinct rows only" when using the query builder does not solve the problem, because it sees the two records marked in red as different (while it is actually a duplicate), because only the butterfat % differs slightly (maybe because of a mistake?).
How do I remove ONE of these records using SAS enterprise guide 7.1 (or 6.1)? Because unfortunately, I am out of my depth with programming.
Please help! :-)
04-03-2017 11:45 AM - edited 04-03-2017 11:46 AM
Could you type the follwing in the programming window ?
data mydata2 duplicates; set mydata;
if first.days_in_milk then flag=1; else flag=0;
if flag=1 then output mydata2;
if flag=0 then output dupicates; run;
04-10-2017 05:44 AM
Maybe this is a stupid question. Where you state "mydata2" and "mydata", am I correct in saying that is the name of my data file?
04-10-2017 06:07 AM
It also tells me that my data file does not exist? EG saves it as a .sas7bdat file. Is this the problem? It looks as if it is looking for a file "WORK.DATA_FILE_NAME.DATA"?
04-11-2017 09:10 AM
In general: You need to post your code and SAS log when asking such questions and you need also to add a bit of explanation what you're doing.
04-03-2017 06:08 PM
If the records are considered distinct because of non-displayed decimal bits it may be that you could round the values to the shown 1 or decimals before trying the distinct rows. Depending on how your are getting the DIM you may need to round that as well.