<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: PROC SURVEYSELECT When Strata Overlap in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/320794#M70705</link>
    <description>&lt;P&gt;Thanks, PGStats and ballardw. &amp;nbsp;Using PGStats' approach gets me closer to the solution, but since CIDs 2 and 4 have two traits and the other CIDs only one, I want CID 2 and 4 to be twice as likely to be selected. &amp;nbsp;In other words, weights. &amp;nbsp;I was hoping I could use FREQ cid_weight&amp;nbsp;in PROC SURVEYSELECT to do this, but, alas, it cannot be used in this manner.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Dave&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data groups;
   input cid trait $ orders;
   rnd = rand('uniform');
datalines;
1 A 2
2 A 4
3 A 6
4 A 8
5 A 10
2 B 4
4 B 8
6 B 16
7 B 18
run;

proc sql;
   create table groups_weights as
    select cid
          ,count(*) as cid_weight
    from groups
    group by cid
;quit;

proc sort data=groups out=groups2;
   by cid rnd;
run;

data groups3;
   merge groups2 (in=a) groups_weights (in=b);
   by cid;
   if first.cid;
   drop rnd;
run;

proc sort data=groups3;
   by trait cid;
run;

proc print data=groups3;
   title 'groups3';
run;

proc surveyselect data=groups3 out=groups_sample sampsize=2 selectall;
   strata trait;
run;

proc print data=groups_sample;
   title 'groups_sample';
run;&lt;/CODE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 22 Dec 2016 18:38:59 GMT</pubDate>
    <dc:creator>doesper</dc:creator>
    <dc:date>2016-12-22T18:38:59Z</dc:date>
    <item>
      <title>PROC SURVEYSELECT When Strata Overlap</title>
      <link>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/319668#M70268</link>
      <description>&lt;P&gt;Let's say you have a group of individuals, each uniquely identified by the variable CID (customer ID). &amp;nbsp;Now, you want to assign each customer to one or more groups based on several attributes (TRAITs). &amp;nbsp;Since these groups are not mutually exclusive, how can you tell PROC SURVEYSELECT to not select the same CID twice when generating a random sample for each TRAIT?&amp;nbsp; In the simple example program I've included here, you'll see that CIDs 2 and 4 have both TRAITs A and B, and sometimes the luck of the draw is that CID 2 and/or 4 are included in the samples for both traits A and B. &amp;nbsp;I don't want that to happen. &amp;nbsp;I know this is an easy problem to solve with data step programming and multiple passes through PROC SURVEYSELECT, but I was hoping to do this with a single pass through PROC SURVEYSELECT.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Dave&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data groups;
   input cid trait $ orders;
datalines;
1 A 2
2 A 4
3 A 6
4 A 8
5 A 10
2 B 4
4 B 8
6 B 16
7 B 18
run;

proc print data=groups;
   title 'groups';
run;

proc surveyselect data=groups out=groups_sample sampsize=2 selectall;
   strata trait;
run;

proc print data=groups_sample;
   title 'groups_sample';
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2016 21:22:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/319668#M70268</guid>
      <dc:creator>doesper</dc:creator>
      <dc:date>2016-12-16T21:22:26Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SURVEYSELECT When Strata Overlap</title>
      <link>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/319670#M70270</link>
      <description>&lt;P&gt;Use multiple levels for strata until they don't overlap. You can have more than one variable. You may be better off with some indicator variables like TraitA = 1 when it has trait=A and 0 other wise, TraitB and so on. Then Strata TraitA TraitB TraitC....&lt;/P&gt;
&lt;P&gt;Though if you want different proportions of each stratum you may get to spend some time building either a Sampsize or Samprate dataset or the value&amp;nbsp; list for teh Sampsize or Samprate option.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2016 21:31:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/319670#M70270</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-12-16T21:31:32Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SURVEYSELECT When Strata Overlap</title>
      <link>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/319779#M70333</link>
      <description>&lt;P&gt;Since a client cannot be more than once in your sample, pick a trait at random for every cid, then pick a stratified sample.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data groups2;
set groups;
rnd = rand("uniform");
run;

proc sort data=groups2; by cid rnd; run;

data groups3;
set groups2; by cid;
if first.cid;
drop rnd;
run;

proc surveyselect data=groups3 out=groups_sample sampsize=2 selectall;
   strata trait;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 18 Dec 2016 04:44:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/319779#M70333</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-12-18T04:44:43Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SURVEYSELECT When Strata Overlap</title>
      <link>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/320794#M70705</link>
      <description>&lt;P&gt;Thanks, PGStats and ballardw. &amp;nbsp;Using PGStats' approach gets me closer to the solution, but since CIDs 2 and 4 have two traits and the other CIDs only one, I want CID 2 and 4 to be twice as likely to be selected. &amp;nbsp;In other words, weights. &amp;nbsp;I was hoping I could use FREQ cid_weight&amp;nbsp;in PROC SURVEYSELECT to do this, but, alas, it cannot be used in this manner.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Dave&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data groups;
   input cid trait $ orders;
   rnd = rand('uniform');
datalines;
1 A 2
2 A 4
3 A 6
4 A 8
5 A 10
2 B 4
4 B 8
6 B 16
7 B 18
run;

proc sql;
   create table groups_weights as
    select cid
          ,count(*) as cid_weight
    from groups
    group by cid
;quit;

proc sort data=groups out=groups2;
   by cid rnd;
run;

data groups3;
   merge groups2 (in=a) groups_weights (in=b);
   by cid;
   if first.cid;
   drop rnd;
run;

proc sort data=groups3;
   by trait cid;
run;

proc print data=groups3;
   title 'groups3';
run;

proc surveyselect data=groups3 out=groups_sample sampsize=2 selectall;
   strata trait;
run;

proc print data=groups_sample;
   title 'groups_sample';
run;&lt;/CODE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2016 18:38:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/320794#M70705</guid>
      <dc:creator>doesper</dc:creator>
      <dc:date>2016-12-22T18:38:59Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SURVEYSELECT When Strata Overlap</title>
      <link>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/320863#M70728</link>
      <description>&lt;P&gt;Make the sampling probability proportional to the number of traits then:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data groups;
   input cid trait $ orders;
datalines;
1 A 2
2 A 4
3 A 6
4 A 8
5 A 10
2 B 4
4 B 8
6 B 16
7 B 18
;

data groups2;
set groups;
rnd = rand("uniform");
run;

proc sql;
create table groups3 as
select cid, trait, orders, count(*) as n
from groups2
group by cid
having rnd = min(rnd);
quit;

proc surveyselect data=groups3 out=groups_sample 
    method=pps sampsize=2 selectall;
strata trait;
size n;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I think there might be an equivalent way of doing this with cluster sampling.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Dec 2016 04:32:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/PROC-SURVEYSELECT-When-Strata-Overlap/m-p/320863#M70728</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-12-23T04:32:21Z</dc:date>
    </item>
  </channel>
</rss>

