<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Proc Surveyselect in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Proc-Surveyselect/m-p/118442#M32691</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks Steve!&amp;nbsp; Your example was very intuitive and straightforward.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For anyone else reading this, I am looking for a SAS book that focuses entirely on sampling as I will be involved a lot more moving forward on this type of work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 08 Jan 2013 20:48:39 GMT</pubDate>
    <dc:creator>Data_Detective_23219</dc:creator>
    <dc:date>2013-01-08T20:48:39Z</dc:date>
    <item>
      <title>Proc Surveyselect</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-Surveyselect/m-p/118440#M32689</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;BR /&gt;I've been reading through the documentation on this procedure and I cannot explicitly find (or perhaps identify) the appropriate syntax to accomodate a specific sampling objective.&amp;nbsp; I am looking to generate a random sample without replacement on a dataset that contains claims adjudicated by different users.&amp;nbsp; Because some users process more claims than others, I want the sampling methodology to assign each user an equal probability of being chosen in the random sample&amp;nbsp; (I do not need a specific number [or proportion] of observations in the random sample from each user as many of the documentation examples show).&amp;nbsp; This will purge the bias related to claims examiners who process more claims having a higher chance of being "randomly" chosen because more of their claim lines exist on the dataset. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I will provide a couple of arbitrary parameters so someone can illustrate the appropriate syntax. &lt;/P&gt;&lt;P&gt;Claimlines: 1,000,000&lt;/P&gt;&lt;P&gt;Random Sample Size: 100&lt;/P&gt;&lt;P&gt;Variable identifying the claims examiner: User&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 08 Jan 2013 14:59:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-Surveyselect/m-p/118440#M32689</guid>
      <dc:creator>Data_Detective_23219</dc:creator>
      <dc:date>2013-01-08T14:59:23Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-Surveyselect/m-p/118441#M32690</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you want equal probability of being selected based on User, why not do something like:&lt;/P&gt;&lt;P&gt;/* Make up some data */&lt;/P&gt;&lt;P&gt;data have;&lt;BR /&gt;do user=1 to 1000;&lt;BR /&gt;do claim=1 to 2000;&lt;BR /&gt;flag=mod(user,2);&lt;BR /&gt;output;&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* Make it so that some of the users have more claims than others, so that this is a little more general */&lt;BR /&gt;data have;&lt;BR /&gt;set have;&lt;BR /&gt;if flag=0 and claim&amp;gt;1000 then delete;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* Give every record a unique random number */&lt;/P&gt;&lt;P&gt;/* This is the point where you can use your own "have" dataset */&lt;/P&gt;&lt;P&gt;data firstpass;&lt;/P&gt;&lt;P&gt;call streaminit(111);&lt;/P&gt;&lt;P&gt;set have;&lt;/P&gt;&lt;P&gt;ranno=rand('UNIFORM');&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* Sort now so that records are randomized within each user */&lt;/P&gt;&lt;P&gt;proc sort data=firstpass out=secondpass;&lt;/P&gt;&lt;P&gt;by user ranno;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* Select the record for each user with the lowest random value */&lt;/P&gt;&lt;P&gt;data thirdpass;&lt;/P&gt;&lt;P&gt;set secondpass;&lt;/P&gt;&lt;P&gt;by user ranno;&lt;/P&gt;&lt;P&gt;if first.user;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* Now resort across all users in preparation of selecting with equal opportunity for selection */&lt;/P&gt;&lt;P&gt;proc sort data=thirdpass out=fourthpass;&lt;/P&gt;&lt;P&gt;by ranno;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* Select 100 random users, with one of several records still attached */&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;set fourthpass;&lt;/P&gt;&lt;P&gt;if _n_&amp;lt;=100;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you want all or multiple records for the selected users, you can then merge this back against the original data set.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(And I am sure there is a way to do this in PROC SURVEYSELECT, but this is pretty fast, and pretty straightforward.&amp;nbsp; I think&amp;nbsp; &lt;A __default_attr="2746" __jive_macro_name="user" class="jive_macro jive_macro_user" href="https://communities.sas.com/"&gt;&lt;/A&gt; can come up with something better than this.&amp;nbsp; Despite all the sorting and subsetting, this took about 1.7 seconds cpu time, and a little less real time due to multithreading).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A __default_attr="2746" __jive_macro_name="user" class="jive_macro jive_macro_user" href="https://communities.sas.com/"&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 08 Jan 2013 18:32:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-Surveyselect/m-p/118441#M32690</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2013-01-08T18:32:59Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-Surveyselect/m-p/118442#M32691</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks Steve!&amp;nbsp; Your example was very intuitive and straightforward.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For anyone else reading this, I am looking for a SAS book that focuses entirely on sampling as I will be involved a lot more moving forward on this type of work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 08 Jan 2013 20:48:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-Surveyselect/m-p/118442#M32691</guid>
      <dc:creator>Data_Detective_23219</dc:creator>
      <dc:date>2013-01-08T20:48:39Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-Surveyselect/m-p/118443#M32692</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi! If you are going to be involved in a lot of sampling then you should definitely get familiar with SURVEYSELECT. I haven't a full knowledge of the procedure but, anyway, here is how I would accomplish your sampling :&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;data test;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;call streaminit(3746);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;do user = 1 to 30;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do j = 1 to 1 + rand("POISSON",3);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; claim = user * 100 + j;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;drop j;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;run;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* Select some user's claims as clusters */&lt;/P&gt;&lt;P&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;proc surveyselect data=test out=temp n=10;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;samplingunit user;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;run;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* Select one claim per selected user */&lt;/P&gt;&lt;P&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;proc surveyselect data=temp out=want n=1;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;strata user;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: calibri, verdana, arial, sans-serif;"&gt;run;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 09 Jan 2013 02:20:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-Surveyselect/m-p/118443#M32692</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2013-01-09T02:20:10Z</dc:date>
    </item>
  </channel>
</rss>

