- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all. I have data (dairy cattle) that look as follows:
animal days_in_milk milk_yield butterfat_% protein_%
1234567 35 30.8 3.51 3.11
1234567 65 39.2 3.32 3.09
1234567 95 38.5 3.21 3.02
1234568 15 32.7 3.15 3.06
1234568 45 36.7 3.13 3.06
1234568 75 34.4 3.20 3.01
1234568 106 28.3 3.07 2.56
1234569 6 30.2 3.05 3.10
1234569 40 41.2 3.08 3.12
1234569 70 37.5 3.51 2.98
1234569 99 32.4 3.01 2.99
1234569 131 26.3 3.21 2.98
Each cow (cows are coloured differently) has more than one observation, let us take milk yield. Each observation is how many days she has been in milk (variable days_in_milk). Cows can have 3, 4 or 5 (or more) records.
I have two questions:
How do I index each observation from 1 to n in enterprise guide?
How do I create a column that show the number of observations (N_obs) for each animal, like shown below:
animal days_in_milk milk_yield butterfat_% protein_% Index N_obs
1234567 35 30.8 3.51 3.11 1 3
1234567 65 39.2 3.32 3.09 2 3
1234567 95 38.5 3.21 3.02 3 3
1234568 15 32.7 3.15 3.06 1 4
1234568 45 36.7 3.13 3.06 2 4
1234568 75 34.4 3.20 3.01 3 4
1234568 106 28.3 3.07 2.56 4 4
1234569 6 30.2 3.05 3.10 1 5
1234569 40 41.2 3.08 3.12 2 5
1234569 70 37.5 3.51 2.98 3 5
1234569 99 32.4 3.01 2.99 4 5
1234569 131 26.3 3.21 2.98 5 5
Please help. 🙂
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming your data set is already in sorted order:
data want;
N_obs=0;
do until (last.animal);
set have;
by animal;
n_obs + 1;
end;
index=0;
do until (last.animal);
set have;
by animal;
index + 1;
output;
end;
run;
Regarding your added comment above, the OUTPUT statement is flexible. It could, for example, be changed to:
if n_obs > 8 then output;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Let me post it again: Post test data in the form of a datastep!!!
At a guess:
data want; set have; by animal; if first.animal then index=1; else index=index+1; run;
Then if you want maximum obs you have various methods of getting that - proc sort:
proc sort data=want; by animal ascending index; run; data want; set want; by animal index; retain n_obs; if first.animal then n_obs=index; run;
You could also do it in SQL, you coulde do a hash etc. Loads of ways. What is your actual question?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much. I will try that.
What I actually want to achieve is to discard animals with a minimum number of records, say for instance 8 record.
Thank you very much in advance. Very much appreciated!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming your data set is already in sorted order:
data want;
N_obs=0;
do until (last.animal);
set have;
by animal;
n_obs + 1;
end;
index=0;
do until (last.animal);
set have;
by animal;
index + 1;
output;
end;
run;
Regarding your added comment above, the OUTPUT statement is flexible. It could, for example, be changed to:
if n_obs > 8 then output;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content