Dear SAS community, I want to submit to your thoughts one problem. I have 2 datasets one from a cross-sectional study and another from a prospective cohort. I am more used with data analysis than internal sampling of my data… The 1st study is data obtained from calves between 1 to 21 days old. Each calf has only 1 data line (cross-sectional study 1 visit) The dataset I have is on the form: DATASET1 calfID FarmID Date_birth Date_visit Age Gender X1 X2 X3 111 FARM1 Male or female Where CalfD is the eartag number of the calf (unique for a specific calf), FarmID is farm identification code, Date_birth the date of calf_birth, Date_visit the day we measured X1,X2 and X2 which are continuous numeric data. Age is the difference between the 2 dates (which represents calf’s age). We’ve also collected gender information. The 2nd dataset is coming from different farms/ animal. The same data are collected but the same calf can be repeated 2 up to 3 times (extra column visit which indicate the visit number ofr a specific calf) as below (the interval between the visit is the same: 1 week). Calves have the same age range in the 2 datasets (from 1 day to 21 days). DATASET2 (calves are replicated during 2 to 3 visits one week apart) calfID FarmID Visit Date_birth Date_visit Age Gender X1 X2 X3 222 FARM2 1 222 FARM2 2 222 FARM2 3 223 FARM2 1 223 FARM2 2 333 FARM3 1 My objectives are to perform a logistic regression for predicting calves probability of being younger than X days (different age cut-off would be used) based on covariates Gender, X1, X2, X3 using farm as a random effect. I want to use information from both database. I want to sample the DATASET 2 to have only 1 sample per calf but also being able to select calves from the database based on the age distribution I want to have. For example if I have 115 calves from dataset 1 and 200 calves from dataset 2. I want to select calves (1 visit calf only) conditional on age characteristics (ex: having a median distribution of the age of calf sampled that I can specified). I therefore want to know if you have any clues on how sampling the DATASET2 to achieve my goals. I hope that this problem is clearly defined and can be solved with your expertise. If possible, in a second step I would be interested to make internal validation of my models using bootstrap samples of my 2 datasets (respecting 1 sample per calf). But I want to start by a more simple approach.
... View more