<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How does Proc SurveySelect Deal with Missing values? in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/How-does-Proc-SurveySelect-Deal-with-Missing-values/m-p/690620#M33291</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/310922"&gt;@Daisy2&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Thanks for the quick reply. After thinking about your question, I think it's most appropriate to do the former and not the latter - not use the records at all.&lt;BR /&gt;Where do I put your suggested code? Before Proc SurveySelect? (I'm pretty new to SAS) such as&lt;BR /&gt;Where not missing(nce);&lt;BR /&gt;proc surveyselect data=....??&lt;BR /&gt;&lt;BR /&gt;or within the procedure?&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;In the procedure:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Proc surveyselect data=&amp;nbsp;&amp;nbsp; &amp;lt;other proc options&amp;gt;;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; where not missing(nce);&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;lt;remaing surveyselect statements&amp;gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Actual order is not critical but by placing it near the start of the procedure it is easier, IMHO, to remind me I was filtering data for some reason when I come back to the code later. In fact I would likely insert a comment such as&lt;/P&gt;
&lt;P&gt;/* Do not want the missing values in this set to be considered as a Cluster for selection*/&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 10 Oct 2020 01:05:34 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2020-10-10T01:05:34Z</dc:date>
    <item>
      <title>How does Proc SurveySelect Deal with Missing values?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-does-Proc-SurveySelect-Deal-with-Missing-values/m-p/690585#M33281</link>
      <description>&lt;P&gt;I'm trying to split my data into Train (70%) and Test data sets.&amp;nbsp; My file has 508 rows, but the column I want to split only has about 350 cells filled.&amp;nbsp; The rest are empty.&amp;nbsp; I'm splitting the data through strata (3 cols) with simple random selection which seems to work, but it seems to be splitting the proportions from the 508 rows vs. the ~350 rows with actual data.&amp;nbsp; How do I get it to ignore the missing values in my data column "nce" for the split.&amp;nbsp; Here's my code. Thanks for the help.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/*partitioning data into 2 sets at 70% split with stratification*/
/* new column Selected = 0 is the 30% Test set, Selected = 1 is the 70% Train set */
proc surveyselect data=phrs2cut method=srs samprate= 0.7 out=StratSRS seed=12345 outall;
	samplingunit nce;
	strata loc2 ntrt2 pd2 notsorted / alloc=proportional stats;
	title "After Split";
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 09 Oct 2020 21:21:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-does-Proc-SurveySelect-Deal-with-Missing-values/m-p/690585#M33281</guid>
      <dc:creator>Daisy2</dc:creator>
      <dc:date>2020-10-09T21:21:05Z</dc:date>
    </item>
    <item>
      <title>Re: How does Proc SurveySelect Deal with Missing values?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-does-Proc-SurveySelect-Deal-with-Missing-values/m-p/690592#M33286</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/310922"&gt;@Daisy2&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I'm trying to split my data into Train (70%) and Test data sets.&amp;nbsp; My file has 508 rows, but the column I want to split only has about 350 cells filled.&amp;nbsp; The rest are empty.&amp;nbsp; I'm splitting the data through strata (3 cols) with simple random selection which seems to work, but it seems to be splitting the proportions from the 508 rows vs. the ~350 rows with actual data.&amp;nbsp; How do I get it to ignore the missing values in my data column "nce" for the split.&amp;nbsp; Here's my code. Thanks for the help.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/*partitioning data into 2 sets at 70% split with stratification*/
/* new column Selected = 0 is the 30% Test set, Selected = 1 is the 70% Train set */
proc surveyselect data=phrs2cut method=srs samprate= 0.7 out=StratSRS seed=12345 outall;
	samplingunit nce;
	strata loc2 ntrt2 pd2 notsorted / alloc=proportional stats;
	title "After Split";
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Please describe what you mean by "ignore the missing values"?&lt;/P&gt;
&lt;P&gt;Do you mean not use the record at all, i.e. the source data to the ~350 observations with data for the variable? That could be done with a WHERE statement&amp;nbsp; such as: Where not missing(nce);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Not treat the missing value as level but include the records for selection?&lt;/P&gt;
&lt;P&gt;From the documentation for Surveyselect:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;DIV class="xis-refProc"&gt;
&lt;DIV id="statug.surveyselect.selectmissing" class="AAsection"&gt;
&lt;P&gt;PROC SURVEYSELECT treats missing values of &lt;A href="http://127.0.0.1:63252/help/statug.hlp/statug_surveyselect_syntax07.htm" target="_blank"&gt;STRATA&lt;/A&gt; and &lt;A href="http://127.0.0.1:63252/help/statug.hlp/statug_surveyselect_syntax05.htm" target="_blank"&gt;SAMPLINGUNIT&lt;/A&gt; variables like any other STRATA or SAMPLINGUNIT variable value. The missing values form a separate, valid variable level.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;It looks like you may need to create an additional variable that has the properties for clustering that you want.&lt;/P&gt;</description>
      <pubDate>Fri, 09 Oct 2020 21:45:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-does-Proc-SurveySelect-Deal-with-Missing-values/m-p/690592#M33286</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-10-09T21:45:18Z</dc:date>
    </item>
    <item>
      <title>Re: How does Proc SurveySelect Deal with Missing values?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-does-Proc-SurveySelect-Deal-with-Missing-values/m-p/690608#M33289</link>
      <description>Thanks for the quick reply. After thinking about your question, I think it's most appropriate to do the former and not the latter - not use the records at all.&lt;BR /&gt;Where do I put your suggested code? Before Proc SurveySelect? (I'm pretty new to SAS) such as&lt;BR /&gt;Where not missing(nce);&lt;BR /&gt;proc surveyselect data=....??&lt;BR /&gt;&lt;BR /&gt;or within the procedure?</description>
      <pubDate>Fri, 09 Oct 2020 22:56:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-does-Proc-SurveySelect-Deal-with-Missing-values/m-p/690608#M33289</guid>
      <dc:creator>Daisy2</dc:creator>
      <dc:date>2020-10-09T22:56:05Z</dc:date>
    </item>
    <item>
      <title>Re: How does Proc SurveySelect Deal with Missing values?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-does-Proc-SurveySelect-Deal-with-Missing-values/m-p/690620#M33291</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/310922"&gt;@Daisy2&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Thanks for the quick reply. After thinking about your question, I think it's most appropriate to do the former and not the latter - not use the records at all.&lt;BR /&gt;Where do I put your suggested code? Before Proc SurveySelect? (I'm pretty new to SAS) such as&lt;BR /&gt;Where not missing(nce);&lt;BR /&gt;proc surveyselect data=....??&lt;BR /&gt;&lt;BR /&gt;or within the procedure?&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;In the procedure:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Proc surveyselect data=&amp;nbsp;&amp;nbsp; &amp;lt;other proc options&amp;gt;;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; where not missing(nce);&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;lt;remaing surveyselect statements&amp;gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Actual order is not critical but by placing it near the start of the procedure it is easier, IMHO, to remind me I was filtering data for some reason when I come back to the code later. In fact I would likely insert a comment such as&lt;/P&gt;
&lt;P&gt;/* Do not want the missing values in this set to be considered as a Cluster for selection*/&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Oct 2020 01:05:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-does-Proc-SurveySelect-Deal-with-Missing-values/m-p/690620#M33291</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-10-10T01:05:34Z</dc:date>
    </item>
  </channel>
</rss>

