<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Independence of observations for classification models in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Independence-of-observations-for-classification-models/m-p/793169#M9046</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am currently working on putting together a dataset for a classification model (with a standard binary outcome), and I have a general question regarding independence of observations.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The data I am working with is aggregated to the sales_id level, and it is joined to contract data from a different data source. A single contract (contract_id) could be found in multiple sales_ids. The stakeholder would like me to create a column to indicate whether the contract_id is found in other sales_ids. Another column is to check whether the contract was executed within 10 days of a previous contract record in another sales_id observation. The goal is to generate predictions at the sales_id level.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;sales_id&lt;/TD&gt;&lt;TD&gt;contract_id&lt;/TD&gt;&lt;TD&gt;contract_date&lt;/TD&gt;&lt;TD&gt;contract_amount&lt;/TD&gt;&lt;TD&gt;contract_in_other_order&lt;/TD&gt;&lt;TD&gt;dup_contract_within_10days&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;651423&lt;/TD&gt;&lt;TD&gt;2456&lt;/TD&gt;&lt;TD&gt;12/1/2021&lt;/TD&gt;&lt;TD&gt;500&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;561486&lt;/TD&gt;&lt;TD&gt;2456&lt;/TD&gt;&lt;TD&gt;12/5/2021&lt;/TD&gt;&lt;TD&gt;500&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;618234&lt;/TD&gt;&lt;TD&gt;2456&lt;/TD&gt;&lt;TD&gt;12/31/2021&lt;/TD&gt;&lt;TD&gt;500&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would engineering additional columns that checks other observations violate the assumption of independence of observations? If so, is this an issue for a classification model (like it is for linear regression models)? If this would cause problems, what remedies are available?&lt;/P&gt;</description>
    <pubDate>Fri, 28 Jan 2022 17:38:51 GMT</pubDate>
    <dc:creator>GuyTreepwood</dc:creator>
    <dc:date>2022-01-28T17:38:51Z</dc:date>
    <item>
      <title>Independence of observations for classification models</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Independence-of-observations-for-classification-models/m-p/793169#M9046</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am currently working on putting together a dataset for a classification model (with a standard binary outcome), and I have a general question regarding independence of observations.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The data I am working with is aggregated to the sales_id level, and it is joined to contract data from a different data source. A single contract (contract_id) could be found in multiple sales_ids. The stakeholder would like me to create a column to indicate whether the contract_id is found in other sales_ids. Another column is to check whether the contract was executed within 10 days of a previous contract record in another sales_id observation. The goal is to generate predictions at the sales_id level.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;sales_id&lt;/TD&gt;&lt;TD&gt;contract_id&lt;/TD&gt;&lt;TD&gt;contract_date&lt;/TD&gt;&lt;TD&gt;contract_amount&lt;/TD&gt;&lt;TD&gt;contract_in_other_order&lt;/TD&gt;&lt;TD&gt;dup_contract_within_10days&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;651423&lt;/TD&gt;&lt;TD&gt;2456&lt;/TD&gt;&lt;TD&gt;12/1/2021&lt;/TD&gt;&lt;TD&gt;500&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;561486&lt;/TD&gt;&lt;TD&gt;2456&lt;/TD&gt;&lt;TD&gt;12/5/2021&lt;/TD&gt;&lt;TD&gt;500&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;618234&lt;/TD&gt;&lt;TD&gt;2456&lt;/TD&gt;&lt;TD&gt;12/31/2021&lt;/TD&gt;&lt;TD&gt;500&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would engineering additional columns that checks other observations violate the assumption of independence of observations? If so, is this an issue for a classification model (like it is for linear regression models)? If this would cause problems, what remedies are available?&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jan 2022 17:38:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Independence-of-observations-for-classification-models/m-p/793169#M9046</guid>
      <dc:creator>GuyTreepwood</dc:creator>
      <dc:date>2022-01-28T17:38:51Z</dc:date>
    </item>
    <item>
      <title>Re: Independence of observations for classification models</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Independence-of-observations-for-classification-models/m-p/793217#M9047</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To see if you violate the independent observations assumption, you can&amp;nbsp;&lt;STRONG&gt;plot residuals against any variables &lt;/STRONG&gt;&lt;SPAN&gt;used in the technique (e.g., factors, regressors). A pattern that is not random suggests lack of independence.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Also, if you have an abundance&amp;nbsp;of observations, do data splitting.&lt;BR /&gt;Make a training, a validation and a test set.&lt;BR /&gt;If your model holds up to independent out-of-sample observations (never seen by the model), then I think you are OK.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Cheers,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Koen&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jan 2022 20:26:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Independence-of-observations-for-classification-models/m-p/793217#M9047</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2022-01-28T20:26:47Z</dc:date>
    </item>
  </channel>
</rss>

