<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Resampling one data per subject where multiple observation are available in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/504085#M15639</link>
    <description>&lt;P&gt;Thank you very much, it works perfectly well.&lt;/P&gt;</description>
    <pubDate>Sun, 14 Oct 2018 11:03:13 GMT</pubDate>
    <dc:creator>SBuc</dc:creator>
    <dc:date>2018-10-14T11:03:13Z</dc:date>
    <item>
      <title>Resampling one data per subject where multiple observation are available</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/503672#M15623</link>
      <description>&lt;P&gt;Dear SAS community, I want to submit to your thoughts one problem. I have 2 datasets one from a cross-sectional study and another from a prospective cohort. I am more used with data analysis than internal sampling of my data…&lt;/P&gt;&lt;P&gt;The 1st study is data obtained from calves between 1 to 21 days old. Each calf has only 1 data line (cross-sectional study 1 visit)&lt;/P&gt;&lt;P&gt;The dataset I have is on the form:&lt;/P&gt;&lt;P&gt;DATASET1&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;calfID&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FarmID&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Date_birth&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Date_visit&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Age&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Gender&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;X1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;X2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;X3&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;111&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FARM1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Male or female&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;Where CalfD is the eartag number of the calf (unique for a specific calf), FarmID is farm identification code, Date_birth the date of calf_birth, Date_visit the day we measured X1,X2 and X2 which are continuous numeric data. Age is the difference between the 2 dates (which represents calf’s age). We’ve also collected gender information.&lt;/P&gt;&lt;P&gt;The 2nd dataset is coming from different farms/ animal. The same data are collected but the same calf can be repeated 2 up to 3 times (extra column visit which indicate the visit number ofr a specific calf) as below (the interval between the visit is the same: 1 week). Calves have the same age range in the 2 datasets (from 1 day to 21 days).&lt;/P&gt;&lt;P&gt;DATASET2 (calves are replicated during 2 to 3 visits one week apart)&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;calfID&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FarmID&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Visit&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Date_birth&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Date_visit&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Age&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Gender&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;X1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;X2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;X3&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;222&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FARM2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;222&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FARM2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;222&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FARM2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;3&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;223&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FARM2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;223&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FARM2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;333&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FARM3&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My objectives are to perform a logistic regression for predicting calves probability of being younger than X days (different age cut-off would be used) based on covariates Gender, X1, X2, X3 using farm as a random effect. I want to use information from both database.&lt;/P&gt;&lt;P&gt;I want to sample the DATASET 2 to have only 1 sample per calf but also being able to select calves from the database based on the age distribution I want to have. For example if I have 115 calves from dataset 1 and 200 calves from dataset 2. I want to select calves (1 visit calf only) conditional on age characteristics (ex: having a median distribution of the age of calf sampled that I can specified).&lt;/P&gt;&lt;P&gt;I therefore want to know if you have any clues on how sampling the DATASET2 to achieve my goals. I hope that this problem is clearly defined and can be solved with your expertise.&lt;/P&gt;&lt;P&gt;If possible, in a second step I would be interested to make internal validation of my models using bootstrap samples of my 2 datasets (respecting 1 sample per calf). But I want to start by a more simple approach.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 12 Oct 2018 10:34:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/503672#M15623</guid>
      <dc:creator>SBuc</dc:creator>
      <dc:date>2018-10-12T10:34:35Z</dc:date>
    </item>
    <item>
      <title>Re: Resampling one data per subject where multiple observation are available</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/503934#M15631</link>
      <description>&lt;P&gt;There may not be an algorithm to find an optimal solution to this subsampling problem. But here is a way to find a pretty good solution.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Mock data */
data have;
call streaminit(7897);
do id = 1 to 200;
    /* Age at first visit */
    age = rand("integer", 1, 14);
    do visit = 1 by 1 while(age &amp;lt;= 21);
        output;
        /* visits are one week apart */
        age = age + 7;
        end;
    end;
keep id visit age;
run; 

/* Required age median */
%let targetAgeMed = 16;

/* Pick the most desirable visits */ 
data want;
do until(last.id);
    set have; by id;
    ageMin = min(ageMin, age);
    ageMax = max(ageMax, age);
    /* Largest age inferior to the median */
    if age &amp;lt;= &amp;amp;targetAgeMed then infMedMax = max(infMedMax, age);
    /* Smallest age superior to the median */
    if age &amp;gt;= &amp;amp;targetAgeMed then infMedMin = min(infMedMin, age);
    end;

if infMed &amp;lt;= supMed 
    then do;
        if missing(infMedMax) then pickAge = ageMin;
        else pickAge = infMedMax;
        end;
    else do;
        if missing(infMedMin) then pickAge = ageMax;
        else pickAge = infMedMin;
        end;

if pickAge &amp;gt; &amp;amp;targetAgeMed then supMed + 1;
if pickAge &amp;lt; &amp;amp;targetAgeMed then infMed + 1;

do until(last.id);
    set have; by id;
    if age = pickAge then output;
    end;
drop ageMin ageMax infMedMax infMedMin infMed supMed pickAge;
run;

proc sql;
select * from
(select "Available Age median" as Statistic, median(age) as medianAge from have)
union
(select "Selected Age median", median(age) from want);
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;                          Statistic             medianAge
                          -------------------------------
                          Available Age median         13
                          Selected Age median          16
&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note: you&amp;nbsp;can get a different solution by changing the order of the animals in the data. It might be a good idea to randomize the order before subsetting.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 12 Oct 2018 22:21:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/503934#M15631</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-10-12T22:21:26Z</dc:date>
    </item>
    <item>
      <title>Re: Resampling one data per subject where multiple observation are available</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/504004#M15634</link>
      <description>&lt;P&gt;Thanks for this answer. I don't have access to the data base this week end so It's why I directly tried the code to generate data.&lt;/P&gt;&lt;P&gt;Unfortunately my 9.4 version send me an error message for the "integer" within the rand function.&lt;/P&gt;&lt;P&gt;when looking for this argument in sas book I see that rand is generally followed by a distribution type?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 13 Oct 2018 11:29:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/504004#M15634</guid>
      <dc:creator>SBuc</dc:creator>
      <dc:date>2018-10-13T11:29:51Z</dc:date>
    </item>
    <item>
      <title>Re: Resampling one data per subject where multiple observation are available</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/504036#M15635</link>
      <description>&lt;P&gt;You can replace&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;age = rand("integer", 1, 14);

/* by */

age = ceil(14*rand("uniform"));&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 13 Oct 2018 18:44:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/504036#M15635</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-10-13T18:44:18Z</dc:date>
    </item>
    <item>
      <title>Re: Resampling one data per subject where multiple observation are available</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/504085#M15639</link>
      <description>&lt;P&gt;Thank you very much, it works perfectly well.&lt;/P&gt;</description>
      <pubDate>Sun, 14 Oct 2018 11:03:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Resampling-one-data-per-subject-where-multiple-observation-are/m-p/504085#M15639</guid>
      <dc:creator>SBuc</dc:creator>
      <dc:date>2018-10-14T11:03:13Z</dc:date>
    </item>
  </channel>
</rss>

