<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sampling to meet reference characteristics in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742702#M232369</link>
    <description>&lt;P&gt;Random sampling? or otherwise?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does the sample have to match EXACTLY the original data mean of variable 1 and match EXACTLY the percent of variable 2, or can it be "close"? If "close", how do you determine what is close?&lt;/P&gt;</description>
    <pubDate>Thu, 20 May 2021 16:32:27 GMT</pubDate>
    <dc:creator>PaigeMiller</dc:creator>
    <dc:date>2021-05-20T16:32:27Z</dc:date>
    <item>
      <title>Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742692#M232363</link>
      <description>&lt;P&gt;Hello SAS users,&lt;/P&gt;&lt;P&gt;I am trying to draw a sample from a dataset (data1) to meet a few reference points of the other data(data2). The reference points are 1) mean of variable 1 2) proportion of variable 2 (e.g. ,37% from 0/1 values).&amp;nbsp; These variable 1 and 2 exist in data1. I would like to draw a sample from data 1 with mean of variable 1 and proportion of variable 2 that are same values as data 2. Is there a way?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 May 2021 15:56:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742692#M232363</guid>
      <dc:creator>leex1514</dc:creator>
      <dc:date>2021-05-20T15:56:10Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742702#M232369</link>
      <description>&lt;P&gt;Random sampling? or otherwise?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does the sample have to match EXACTLY the original data mean of variable 1 and match EXACTLY the percent of variable 2, or can it be "close"? If "close", how do you determine what is close?&lt;/P&gt;</description>
      <pubDate>Thu, 20 May 2021 16:32:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742702#M232369</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2021-05-20T16:32:27Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742706#M232372</link>
      <description>&lt;P&gt;Hi Paige,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Random sampling is preferable but if I cannot meet the targets with random sampling, non-random sampling is ok. The targets need to be close (enough) within a standard error (of course the exact match is preferable).&amp;nbsp;&lt;/P&gt;&lt;P&gt;thank you&lt;/P&gt;</description>
      <pubDate>Thu, 20 May 2021 16:38:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742706#M232372</guid>
      <dc:creator>leex1514</dc:creator>
      <dc:date>2021-05-20T16:38:45Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742722#M232382</link>
      <description>&lt;P&gt;How many records are there in your original data? How many records do you want in your sample?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you only have 30 records to start with and want 5 in the sample it will likely be a bit difficult to get very close.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;On the other hand if your data has 100,000 and you want 1000 in the sample you have a better chance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 May 2021 17:30:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742722#M232382</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-05-20T17:30:07Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742738#M232392</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/382778"&gt;@leex1514&lt;/a&gt;&amp;nbsp;and welcome to the SAS Support Communities!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It's relatively easy to get the &lt;SPAN&gt;proportions of variable 2 in the sample as close as possible to the original proportions: Use ALLOC=PROP in the STRATA statement of PROC SURVEYSELECT.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;With a sufficient number of random samples to choose from (if needed), the mean of variable 1 will be reasonably close enough to the original value, too.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Here's an example using SASHELP.HEART (featuring &lt;FONT face="courier new,courier"&gt;Systolic&lt;/FONT&gt; as variable 1 [&lt;FONT face="courier new,courier"&gt;var1&lt;/FONT&gt;] and &lt;FONT face="courier new,courier"&gt;Status='Dead'&lt;/FONT&gt; as variable 2 [&lt;FONT face="courier new,courier"&gt;var2&lt;/FONT&gt;]):&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Create example data for demonstration */

data have;
set sashelp.heart(rename=(systolic=var1));
var2=(status='Dead');
run;

proc sort data=have;
by var2;
run; /* 5209 obs. */

/* Draw 500 stratified random samples of size n=1000 */

proc surveyselect data=have rep=500
method=srs n=1000
seed=2718 out=samp(drop=total--samplingweight);
strata var2 / alloc=prop;
run; /* 500,000 obs. */

/* Compute sample means of VAR1 */

proc summary data=samp nway;
class replicate;
var var1;
output out=sampmeans(drop=_:) mean=m;
run;

/* Determine the REPLICATE number of the optimum sample */

proc sql noprint;
select replicate into :repno
from (select *, abs(m-(select mean(var1) from have)) as _d from sampmeans
      having _d=min(_d));
quit;

/* Select the optimum sample */

data want(drop=replicate);
set samp;
where replicate=&amp;amp;repno;
run; /* 1000 obs. */

/* Compare means of VAR1 and VAR2 between the full dataset and the sample */

title 'Full dataset';
proc means data=have n mean;
var var1 var2;
run;

title 'Sample';
proc means data=want n mean;
var var1 var2;
run;
title;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Result:&lt;/P&gt;
&lt;PRE&gt;Full dataset

Variable       N            Mean
--------------------------------
var1        5209     136.9095796
var2        5209       0.3822231
--------------------------------

Sample

Variable       N            Mean
--------------------------------
var1        1000     136.9100000
var2        1000       0.3820000
--------------------------------&lt;/PRE&gt;
&lt;P&gt;(Note that the mean of &lt;FONT face="courier new,courier"&gt;var2&lt;/FONT&gt; is the proportion of &lt;FONT face="courier new,courier"&gt;var2=1&lt;/FONT&gt;.)&lt;/P&gt;</description>
      <pubDate>Thu, 20 May 2021 17:59:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742738#M232392</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2021-05-20T17:59:23Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742750#M232404</link>
      <description>&lt;P&gt;Thank you FreelanceReinhard. This looks very promising solution. One thing is that var1 and var2 are in data 1 and data 2. I need to sample from data 2 to meet the target of mean of var1 and proportion of var2 in the data1. Could you tweak your code to reflect that?&lt;/P&gt;</description>
      <pubDate>Thu, 20 May 2021 18:25:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742750#M232404</guid>
      <dc:creator>leex1514</dc:creator>
      <dc:date>2021-05-20T18:25:27Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742758#M232408</link>
      <description>&lt;P&gt;Data 1 has N=99829 and data 2 N=129071. There are a lot of data points to sample from.&lt;/P&gt;</description>
      <pubDate>Thu, 20 May 2021 18:56:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742758#M232408</guid>
      <dc:creator>leex1514</dc:creator>
      <dc:date>2021-05-20T18:56:08Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742774#M232412</link>
      <description>&lt;P&gt;Here's an example that reflects your situation with &lt;EM&gt;two&lt;/EM&gt; given datasets more closely, using arbitrary subsets HAVE1 and HAVE2 of SASHELP.HEART:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Create example data for demonstration */

data have1 have2;
set sashelp.heart(rename=(systolic=var1));
var2=(status='Dead');
if ~mod(_n_,3) then output have1; /* 1736 obs. */
else output have2; /* 3473 obs. */
run; 

proc sort data=have2;
by var2;
run;

/* Determine target proportions of VAR2 values (0, 1) */

proc freq data=have1 noprint;
tables var2 / out=targetprops(keep=var2 percent rename=(percent=_alloc_));
run;

/* Draw 500 stratified random samples of size n=1000 from HAVE2 */

proc surveyselect data=have2 rep=500
method=srs n=1000
seed=2718 out=samp(drop=total--samplingweight);
strata var2 / alloc=targetprops;
run; /* 500,000 obs. */

/* Compute sample means of VAR1 */

proc summary data=samp nway;
class replicate;
var var1;
output out=sampmeans(drop=_:) mean=m;
run;

/* Determine the REPLICATE number of the optimum sample */

proc sql noprint;
select replicate into :repno
from (select *, abs(m-(select mean(var1) from have1)) as d from sampmeans
      having d=min(d));
quit;

/* Select the optimum sample */

data want(drop=replicate);
set samp;
where replicate=&amp;amp;repno;
run; /* 1000 obs. */

/* Compare means of VAR1 and VAR2 between dataset HAVE1 and the sample */

title 'Dataset HAVE1';
proc means data=have1 n mean;
var var1 var2;
run;

title 'Sample from HAVE2';
proc means data=want n mean;
var var1 var2;
run;
title;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Result:&lt;/P&gt;
&lt;PRE&gt;Dataset HAVE1

Variable       N            Mean
--------------------------------
var1        1736     136.8110599
var2        1736       0.3652074
--------------------------------

Sample from HAVE2

Variable       N            Mean
--------------------------------
var1        1000     136.8140000
var2        1000       0.3650000
--------------------------------&lt;/PRE&gt;
&lt;P&gt;Caveat: If the means of &lt;FONT face="courier new,courier"&gt;var1&lt;/FONT&gt; (and the frequency distributions of &lt;FONT face="courier new,courier"&gt;var2&lt;/FONT&gt;) differ too much between datasets HAVE1 and HAVE2, the "optimum" sample mean of &lt;FONT face="courier new,courier"&gt;var1&lt;/FONT&gt; selected from 500 (or any feasible number of) samples from HAVE2 might not be "close enough" to the mean in HAVE1. In this situation a different sampling technique would need to be applied.&lt;/P&gt;</description>
      <pubDate>Thu, 20 May 2021 19:32:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742774#M232412</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2021-05-20T19:32:30Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742965#M232510</link>
      <description>&lt;P&gt;The issue is that when we use SRS (simple random sampling) with replication, the mean of each replicated sample is very similar to the mean of data2. I wonder other sampling exist to mitigate this problem. In my original data, data 1 had mean of 357 and data2 had 350. All my replicated samples had close to 350ish means due to SRS. It met var2's proportion though. Perhaps 0/1 data is easier to meet the target than the continuous variable.&lt;/P&gt;</description>
      <pubDate>Fri, 21 May 2021 16:55:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742965#M232510</guid>
      <dc:creator>leex1514</dc:creator>
      <dc:date>2021-05-21T16:55:51Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742986#M232523</link>
      <description>&lt;P&gt;In this case it should be possible (under fairly mild assumptions) to replace the Laplace distribution governing the simple random sampling with a distribution such that (ideally) the &lt;EM&gt;expected value&lt;/EM&gt; of the sample mean matches the target value, i.e., the mean of &lt;FONT face="courier new,courier"&gt;data1&lt;/FONT&gt;. This could be accomplished by assigning suitable &lt;EM&gt;weights&lt;/EM&gt; to each element of &lt;FONT face="courier new,courier"&gt;data2&lt;/FONT&gt;&amp;nbsp;and using an appropriate sampling method. Obviously, there's a wide range of possible distributions, but some of them have likely undesirable properties (e.g., distributions with most of the probability mass concentrated to a narrow interval around the target value).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will think about this over the weekend (CEST time zone) and hopefully find a satisfactory solution to this interesting problem. Maybe other forum experts who are more familiar with the non-default sampling methods of PROC SURVEYSELECT will also contribute their ideas.&lt;/P&gt;</description>
      <pubDate>Fri, 21 May 2021 17:37:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742986#M232523</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2021-05-21T17:37:46Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742997#M232529</link>
      <description>&lt;P&gt;That sounds great! Thank you so much. I concerned about truncation of data distribution in the sample which does not resemble the data 1's properties (mean, std, and shape of distribution) when data 2 is somewhat different from data1. Also I wonder this new method can sample to follow closely to data 1's shape and remove some areas of data2 that have more access frequencies than data1.&lt;/P&gt;</description>
      <pubDate>Fri, 21 May 2021 18:03:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/742997#M232529</guid>
      <dc:creator>leex1514</dc:creator>
      <dc:date>2021-05-21T18:03:34Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/743320#M232702</link>
      <description>&lt;P&gt;What an interesting weekend that was! Many thanks,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/382778"&gt;@leex1514&lt;/a&gt;, for starting this inspiring thread.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;I've executed the plan that I outlined in my previous post.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's again use subsets of SASHELP.HEART for demonstration.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1 have2;
set sashelp.heart(rename=(systolic=var1));
var2=(status='Dead');
if _n_&amp;lt;=2000 then output have1; /* 2000 obs. */
else output have2; /* 3209 obs. */
run; 

proc sort data=have2;
by var2;
run;

proc summary data=have1;
var var1;
output out=stats1(drop=_:) mean=m;
run; /* Mean 133.9 */

proc means data=have2;
var var1;
run; /* Mean 138.785... */

%let r=1000; /* intended size of the random sample from HAVE2 */&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This is a case similar to what you described: Simple, stratified random sampling from HAVE2, even with thousands of replications, doesn't produce sample means close to the target value &lt;FONT face="courier new,courier"&gt;m=133.9&lt;/FONT&gt;. They are all too large, like &amp;gt;136.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, how can sampling weights be defined in order to move the sample mean towards the target? For the time being I went for computational simplicity and set out to define weights as &lt;EM&gt;linear&lt;/EM&gt; functions of VAR1, i.e., weights of the form &lt;FONT face="courier new,courier"&gt;a*VAR1+b&lt;/FONT&gt; with constants &lt;FONT face="courier new,courier"&gt;a&lt;/FONT&gt; and &lt;FONT face="courier new,courier"&gt;b&lt;/FONT&gt;. The simple random sampling applied so far is actually a special case of this: &lt;FONT face="courier new,courier"&gt;a=&lt;STRONG&gt;0&lt;/STRONG&gt;&lt;/FONT&gt;, &lt;FONT face="courier new,courier"&gt;b&amp;gt;0&lt;/FONT&gt; (arbitrary constant).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In a preliminary approach I disregarded the stratification in the definition of the weights. This simplifies the calculation of "optimum" coefficients &lt;FONT face="courier new,courier"&gt;a&lt;/FONT&gt; and &lt;FONT face="courier new,courier"&gt;b&lt;/FONT&gt;.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Set the weight for the target value &lt;FONT face="courier new,courier"&gt;m=133.9&lt;/FONT&gt; to the arbitrary positive value 1, i.e., &lt;FONT face="courier new,courier"&gt;a*m+b=1&lt;/FONT&gt;.&lt;BR /&gt;The weight for a general VAR1 value is then &lt;FONT face="courier new,courier"&gt;a*VAR1+b = a*VAR1+1-a*m = 1+(VAR1-m)*a&lt;/FONT&gt;.&lt;BR /&gt;The requirement that weights must be positive imposes one of two restrictions on VAR1:&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;VAR1 &amp;gt; m-1/a&lt;/FONT&gt; if &lt;FONT face="courier new,courier"&gt;a&amp;gt;0&lt;/FONT&gt;,&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;VAR1 &amp;lt; m-1/a&lt;/FONT&gt; if &lt;FONT face="courier new,courier"&gt;a&amp;lt;0&lt;/FONT&gt;&amp;nbsp;(this applies to our example).&lt;BR /&gt;This is plausible when you imagine the impact of outliers (to be "weighted down") on a linear weighting scheme. Nonlinear (e.g., sigmoid-shaped), everywhere positive weight functions could avoid negative weights by definition. I will give those a try if negative weights abound with your real data. If there are only a few of them, I think it should be acceptable to simply exclude these observations.&lt;/LI&gt;
&lt;LI&gt;The requirement that the weighted mean of the &lt;FONT face="courier new,courier"&gt;n&lt;/FONT&gt; VAR1 values of HAVE2 be equal to the target value &lt;FONT face="courier new,courier"&gt;m&lt;/FONT&gt; yields (by a simple calculation):&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;a=(n*m-sum(VAR1))/(n*m**2-2*sum(VAR1)*m+uss(VAR1))&lt;/FONT&gt;.&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Compute the coefficients a and b and the limit for VAR1 */

proc summary data=have2;
var var1;
output out=stats2(drop=_:) n=n sum=s uss=q;
run;

data coeff;
set stats1;
set stats2;
a=(n*m-s)/(n*m**2-2*s*m+q);
b=1-a*m;
limit=m-1/a;
run; /* a=-0.007818..., b=2.04689..., limit=261.802... */

/* Compute the weights */

data have2_wgt;
if _n_=1 then set coeff(keep=a b);
set have2;
w=a*var1+b;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Seven VAR1 values in HAVE2 get negative weights because they exceed the (here:) upper limit: &lt;FONT face="courier new,courier"&gt;max(VAR1)=300 &amp;gt; 261.802&lt;/FONT&gt;.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Check that the weighted mean of VAR1 equals the target value */

proc sql;
select sum(w*var1)/sum(w) as wmean
from have2_wgt;
quit; /* indeed: 133.9 */

proc means data=have2_wgt;
weight w;
var var1;
run; /* 133.964... -- the minor deviation is due to the replacement of negative weights with zero by PROC MEANS.
        It's small enough so that the sample replication technique can deal with it. Another option would be to
        re-adjust the coefficients a and b after exclusion of the observations with negative weights. */

/* Determine target proportions of VAR2 values (0, 1) */

proc freq data=have1 noprint;
tables var2 / out=targetprops(keep=var2 percent rename=(percent=_alloc_));
run;

/* Draw 500 stratified weighted random samples of size r=1000 from HAVE2_WGT */

proc surveyselect data=have2_wgt rep=500
method=pps n=&amp;amp;r
seed=2718 out=samp;
size w;
strata var2 / alloc=targetprops;
run; /* 500,000 obs. The 7 observations (0.2%) with negative weights are automatically excluded
        by PROC SURVEYSELECT as well. */&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I used the basic &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_surveyselect_details13.htm" target="_blank" rel="noopener"&gt;PPS method&lt;/A&gt; to create the weighted random samples. There are other comparable methods available in PROC SURVEYSELECT which differ from PPS, e.g., in the joint probabilities of selection.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Compute sample means of VAR1 */

proc summary data=samp nway;
class replicate;
var var1;
output out=sampmeans(drop=_:) mean=ms;
run;

/* Determine the REPLICATE number of the optimum sample */

proc sql noprint;
select replicate into :repno
from (select *, abs(ms-(select m from stats1)) as d from sampmeans
      having d=min(d));
quit;

/* Select the optimum sample */

data want(drop=replicate);
set samp;
where replicate=&amp;amp;repno;
run; /* 1000 obs. */

/* Compare means of VAR1 and VAR2 between dataset HAVE1 and the sample */

title 'Dataset HAVE1';
proc means data=have1 n mean;
var var1 var2;
run;

title 'Sample from HAVE2';
proc means data=want n mean;
var var1 var2;
run;
title;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Result:&lt;/P&gt;
&lt;PRE&gt;Dataset HAVE1

Variable       N            Mean
--------------------------------
var1        2000     133.9000000
var2        2000       0.3685000
--------------------------------

Sample from HAVE2

Variable       N            Mean
--------------------------------
var1        1000     133.8930000
var2        1000       0.3680000
--------------------------------&lt;/PRE&gt;
&lt;P&gt;For the sample data from SASHELP.HEART this already works quite well.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The next step is to take the stratification into account in the definition of the weights.&amp;nbsp;I'm going to present that improved approach in a separate post.&lt;/P&gt;</description>
      <pubDate>Mon, 24 May 2021 09:46:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/743320#M232702</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2021-05-24T09:46:29Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/743333#M232709</link>
      <description>&lt;P&gt;Now for the improved approach&amp;nbsp;that takes the stratification into account in the definition of the sampling weights.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Create sample data (same as before) */

data have1 have2;
set sashelp.heart(rename=(systolic=var1));
var2=(status='Dead');
if _n_&amp;lt;=2000 then output have1; /* 2000 obs. */
else output have2; /* 3209 obs. */
run; 

proc sort data=have2;
by var2;
run;

proc summary data=have1;
var var1;
output out=stats1(drop=_:) mean=m;
run; /* Mean 133.9 */

%let r=1000; /* intended size of the random sample from HAVE2 */

/* Determine target proportions of VAR2 values (0, 1) */

proc freq data=have1 noprint;
tables var2 / out=targetprops(keep=var2 percent rename=(percent=_alloc_));
run;

/* Determine sample sizes r0, r1 for the two strata (VAR2=0, VAR2=1) */

data sample_sizes(keep=r:);
set targetprops;
retain r &amp;amp;r r0;
if var2=0 then r0=round(r*_alloc_/100);
if var2=1;
r1=r-r0;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The computation of "optimum" coefficients &lt;FONT face="courier new,courier"&gt;a&lt;/FONT&gt; and &lt;FONT face="courier new,courier"&gt;b&lt;/FONT&gt; for the weights &lt;FONT face="courier new,courier"&gt;a*VAR1+b&lt;/FONT&gt; is now a bit more complicated in the case of &lt;FONT face="courier new,courier"&gt;a&lt;/FONT&gt;. While the first requirement &lt;FONT face="courier new,courier"&gt;a*m+b=1&lt;/FONT&gt; is the same as before, we now want a weighted mean of weighted means (!) to be equal to &lt;FONT face="courier new,courier"&gt;m&lt;/FONT&gt;: &lt;FONT face="courier new,courier"&gt;(r0*m0 + r1*m1)/r = m&lt;/FONT&gt;, where &lt;FONT face="courier new,courier"&gt;m0&lt;/FONT&gt; and &lt;FONT face="courier new,courier"&gt;m1&lt;/FONT&gt; are the weighted means of the VAR1 values in the two strata (VAR2=0, VAR2=1) of HAVE2. This boils down to solving a quadratic equation in &lt;FONT face="courier new,courier"&gt;a&lt;/FONT&gt;, involving summary statistics of VAR1 in the strata.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Compute the coefficients a and b and the limit for VAR1 */

proc summary data=have2 nway;
where var2=0;
var var1;
output out=stats2_0(drop=_:) n=t0 sum=s0 uss=q0;
run;

proc summary data=have2 nway;
where var2=1;
var var1;
output out=stats2_1(drop=_:) n=t1 sum=s1 uss=q1;
run;

data coeff;
set stats1;
set sample_sizes;
set stats2_0;
set stats2_1;
d =  r0*(q0-m*s0)*(s1-m*t1)
   + r1*(q1-m*s1)*(s0-m*t0)
   -r*m*(s0-m*t0)*(s1-m*t1);
c=(  r0*(t1*(q0-m*s0) + s0*(s1-m*t1))
   + r1*(t0*(q1-m*s1) + s1*(s0-m*t0))
   -r*m*(t0*(s1-m*t1) + t1*(s0-m*t0))) / d;
d=(r0*s0*t1 + r1*s1*t0 - r*m*t0*t1)/d;
a=-c/2 + sqrt(c**2/4-d); /* Check the result to decide between +/- sqrt(...). */
b=1-a*m;
limit=m-1/a;
run; /* a=-0.007965..., b=2.06659..., limit=259.439... */

/* Compute the weights */

data have2_wgt;
if _n_=1 then set coeff(keep=a b);
set have2;
w=a*var1+b;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This time 9 of the 3209 VAR1 values (0.3%) in HAVE2 get negative weights because they exceed the upper limit of 259.439.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Check that the weighted mean of weighted means of VAR1 equals the target value */

proc sql;
select ( r0*(select sum(w*var1)/sum(w)
             from have2_wgt
             where var2=0)
        +r1*(select sum(w*var1)/sum(w)
             from have2_wgt
             where var2=1)
       )/r
from sample_sizes;
quit; /* indeed: 133.9 */

/* Draw 500 stratified weighted random samples of size r=1000 from HAVE2_WGT */

proc surveyselect data=have2_wgt rep=500
method=pps n=&amp;amp;r
seed=2718 out=samp;
size w;
strata var2 / alloc=targetprops;
run; /* 500,000 obs. (9 obs. of HAVE2_WGT were excluded due to negative weights) */

/* Compute sample means of VAR1 */

proc summary data=samp nway;
class replicate;
var var1;
output out=sampmeans(drop=_:) mean=ms;
run;

/* Determine the REPLICATE number of the optimum sample */

proc sql noprint;
select replicate into :repno
from (select *, abs(ms-(select m from stats1)) as d from sampmeans
      having d=min(d));
quit;

/* Select the optimum sample */

data want(drop=replicate);
set samp;
where replicate=&amp;amp;repno;
run; /* 1000 obs. */

/* Compare means of VAR1 and VAR2 between dataset HAVE1 and the sample */

title 'Dataset HAVE1';
proc means data=have1 n mean;
var var1 var2;
run;

title 'Sample from HAVE2';
proc means data=want n mean;
var var1 var2;
run;
title;

/* Compare shapes of distributions of VAR1 in HAVE1 and the sample */

data cmp(keep=source var1);
set have1 want(in=s);
if s then source='Sample from HAVE2';
else source='HAVE1';
run;

ods graphics on;

proc univariate data=cmp;
class source;
var var1;
histogram;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Results:&lt;/P&gt;
&lt;PRE&gt;Dataset HAVE1

Variable       N            Mean
--------------------------------
var1        2000     133.9000000
var2        2000       0.3685000
--------------------------------

Sample from HAVE2

Variable       N            Mean
--------------------------------
var1        1000     133.9000000
var2        1000       0.3680000
--------------------------------
&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="cmp_histogram.png" style="width: 640px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/59704iB043B474E7AD0DCD/image-size/large?v=v2&amp;amp;px=999" role="button" title="cmp_histogram.png" alt="cmp_histogram.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;So the refined approach performed even better on the SASHELP.HEART sample data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="2"&gt;Edit: Added comment to assignment statement &lt;FONT face="courier new,courier"&gt;a=...&lt;/FONT&gt;.&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 24 May 2021 15:05:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/743333#M232709</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2021-05-24T15:05:43Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744314#M233184</link>
      <description>&lt;P&gt;Thank you so much for your code! I have tried it but my final sample's means are not meeting targets (mean of var1 is way too high and mean of var2=0) . I found "samp" does not contain var2=1 (only contain var2=0). Maybe the proc surveyselect step is not working properly? Or my data won't work no matter what I do?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p4"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 May 2021 23:09:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744314#M233184</guid>
      <dc:creator>leex1514</dc:creator>
      <dc:date>2021-05-27T23:09:11Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744341#M233193</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/382778"&gt;@leex1514&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Thank you so much for your code! I have tried it but my final sample's means are not meeting targets (mean of var1 is way too high and mean of var2=0) . I found "samp" does not contain var2=1 (only contain var2=0). Maybe the proc surveyselect step is not working properly? Or my data won't work no matter what I do?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/382778"&gt;@leex1514&lt;/a&gt;: I'm sorry and surprised to read this. Intuitively, I think the large deviations from the expected results suggest that a relatively small (!) change to your code, e.g., an adaptation to your data structure, might be sufficient to resolve the issue.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's start our investigation with the simple part, i.e. &lt;FONT face="courier new,courier"&gt;var2&lt;/FONT&gt;, and examine the&amp;nbsp;datasets TARGETPROPS, SAMPLE_SIZES and SAMP created with the most recent approach.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc print data=targetprops;
run;

proc print data=sample_sizes;
run;

proc freq data=samp noprint;
tables replicate*var2 / out=cnt;
run;

proc print data=cnt(obs=10);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Using the&amp;nbsp;SASHELP.HEART sample data, the three PROC PRINT steps above yield&lt;FONT face="helvetica"&gt;:&lt;/FONT&gt;&lt;/P&gt;
&lt;PRE&gt;Obs    var2    _alloc_

 1       0      63.15
 2       1      36.85&lt;/PRE&gt;
&lt;PRE&gt;Obs      r      r0     r1

 1     1000    632    368&lt;/PRE&gt;
&lt;PRE&gt; Obs    Replicate    var2    COUNT    PERCENT

   1        1          0      632      0.1264
   2        1          1      368      0.0736
   3        2          0      632      0.1264
   4        2          1      368      0.0736
   5        3          0      632      0.1264
   6        3          1      368      0.0736
   7        4          0      632      0.1264
   8        4          1      368      0.0736
   9        5          0      632      0.1264
  10        5          1      368      0.0736&lt;/PRE&gt;
&lt;P&gt;The proportions of {var2=0} and {var2=1} are necessarily constant&amp;nbsp;in the replicates in SAMP and identical (up to rounding) to the proportions found in dataset HAVE1 (stored in TARGETPROPS).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please run these PROC PRINT steps on your data and post the results.&lt;/P&gt;</description>
      <pubDate>Fri, 28 May 2021 08:02:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744341#M233193</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2021-05-28T08:02:40Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744394#M233217</link>
      <description>&lt;P&gt;I still have no 1s of var2 in samp. I don't know why proc survey does not pick 0 and 1 in the samp.&lt;/P&gt;</description>
      <pubDate>Fri, 28 May 2021 13:41:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744394#M233217</guid>
      <dc:creator>leex1514</dc:creator>
      <dc:date>2021-05-28T13:41:22Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744395#M233218</link>
      <description>&lt;P&gt;Here are the outputs (var2=w).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;Obs&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;W&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;_alloc_&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;1&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;62.8524&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;2&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;37.1476&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P class="p3"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p3"&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;Obs&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;r&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;r0&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;r1&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;1&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;5000&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;1857&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P class="p3"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p3"&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;Obs&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;Replicate&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;W&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;COUNT&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;PERCENT&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;1&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0.2&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;2&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0.2&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;3&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0.2&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;4&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;4&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0.2&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;5&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;5&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0.2&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;6&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;6&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0.2&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;7&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;7&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0.2&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;8&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;8&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0.2&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;9&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;9&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0.2&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;10&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;10&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;3143&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P class="p2"&gt;0.2&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P class="p4"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 May 2021 13:49:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744395#M233218</guid>
      <dc:creator>leex1514</dc:creator>
      <dc:date>2021-05-28T13:49:28Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744413#M233222</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/382778"&gt;@leex1514&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Here are the outputs (&lt;FONT size="5" color="#FF0000"&gt;var2=w&lt;/FONT&gt;).&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;So, the real name of your "&lt;FONT face="courier new,courier"&gt;var2&lt;/FONT&gt;" happens to be &lt;FONT face="courier new,courier"&gt;w&lt;/FONT&gt; -- which is the name I arbitrarily chose for the &lt;EM&gt;w&lt;/EM&gt;eight variable? This trivial name conflict might explain the nonsensical results. Can you replace the&amp;nbsp;&lt;FONT face="courier new,courier"&gt;w&lt;/FONT&gt; from the code I provided by a different name, say, &lt;FONT face="courier new,courier"&gt;_w&lt;/FONT&gt; or whatever does not occur in your data elsewhere?&lt;/P&gt;</description>
      <pubDate>Fri, 28 May 2021 14:06:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744413#M233222</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2021-05-28T14:06:33Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744427#M233225</link>
      <description>&lt;P&gt;For that reason, I changed weight w to w1 and leave var2 as w. But I will change the var2 name to be something completely different to avoid confusion.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 May 2021 14:38:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744427#M233225</guid>
      <dc:creator>leex1514</dc:creator>
      <dc:date>2021-05-28T14:38:28Z</dc:date>
    </item>
    <item>
      <title>Re: Sampling to meet reference characteristics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744446#M233228</link>
      <description>&lt;P&gt;Thanks. With similar, but different variable names for "VAR2" and the weight variable no name conflict should occur. If your check of the variable names does not solve the problem, the next step will be to examine the log of the PROC SURVEYSELECT step, i.e., the section which looks like this:&lt;/P&gt;
&lt;PRE&gt;67   proc surveyselect data=have2_wgt rep=500
68   method=pps n=&amp;amp;r
69   seed=2718 out=samp;
70   size w;
71   strata var2 / alloc=targetprops;
72   run;

NOTE: 9 sampling units were omitted due to missing or nonpositive size measures.
NOTE: The above message was for the following stratum:
      var2=1.
NOTE: The data set WORK.SAMP has 500000 observations and 28 variables.
NOTE: PROCEDURE SURVEYSELECT used (Total process time):
      real time           1.09 seconds
      cpu time            1.10 seconds&lt;/PRE&gt;
&lt;P&gt;(It's important to use the "&amp;lt;/&amp;gt;" (Insert Code) button to post the log in order to preserve formatting.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As shown in the example, not only warnings and errors, but also notes in the log can indicate certain issues such as the omission of sampling units.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;With &lt;FONT face="courier new,courier"&gt;rep=500&lt;/FONT&gt; and &lt;FONT face="courier new,courier"&gt;&amp;amp;r=5000&lt;/FONT&gt; the number of observations in WORK.SAMP should be 2500000. But your PROC FREQ output dataset already revealed that this is not the case: It appears that your&amp;nbsp;WORK.SAMP has only about 1571500 (=500*3143) observations.&lt;/P&gt;</description>
      <pubDate>Fri, 28 May 2021 15:28:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sampling-to-meet-reference-characteristics/m-p/744446#M233228</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2021-05-28T15:28:49Z</dc:date>
    </item>
  </channel>
</rss>

