The general method would be to assign a random number to each record, then sort by customer number and that random number, then select the first COUNT records of each Customer number.
Something liek this:
data both;
merge a b;
by customer_no;
random_no=rand('uniform');
run;
proc sort data=both;
by customer_no random_no;
run;
data want;
set both;
by customer_no;
if first.customer_no then seq=0;
seq+1;
if seq<=count;
run;
So in this case, you want to randomly pick 3 obs from group Customer=A and 2 obs from Customer=B?
The general method would be to assign a random number to each record, then sort by customer number and that random number, then select the first COUNT records of each Customer number.
Something liek this:
data both;
merge a b;
by customer_no;
random_no=rand('uniform');
run;
proc sort data=both;
by customer_no random_no;
run;
data want;
set both;
by customer_no;
if first.customer_no then seq=0;
seq+1;
if seq<=count;
run;
Hi @Hhh111
One way to do it is by sorting sales data in random order for each customer and then keep only the first n observarions from each customer, where n is the individual sample size for each customer:
* Get test data 1 - Sample size for each customer;
data sample_no;
input Customer$ customer_no count;
cards;
A 111 3
B 222 2
;
run;
* Get test data 2 - Sales data set;
data main;
input Customer$ customer_no sales;
cards;
A 111 12000
A 111 2000
A 111 10000
A 111 7000
B 222 1000
B 222 5000
B 222 6500
;
run;
* Add random number to sales data set to use for sorting;
data m2; set main;
shuffle_id = ranuni(3);
run;
* Join sample size on all records in main and sort on customer - random number;
proc sql;
create table m3 as
select a.Customer, a.customer_no, a.sales, a.shuffle_id, b.count
from m2 as a inner join sample_no as b
on a.Customer = b.Customer and a.customer_no = b.customer_no
order by a.Customer, a.customer_no, a.shuffle_id;
quit;
* Keep only wanted count of records for each customer;
data want (drop=shuffle_id count rcnt);
set m3;
retain rcnt;
by Customer customer_no;
if first.customer_no then rcnt = 0;
rcnt = rcnt + 1;
if rcnt <= count then output;
run;
Data A;
input Customer $ customerno count;
datalines;
A 111 3
B 222 2
;
data B;
input Customer $ customerno sales;
datalines;
A 111 12000
A 111 2000
A 111 10000
A 111 7000
B 222 1000
B 222 5000
B 222 6500
;
data want(keep=Customer customerno sales);
if _N_=1 then do;
declare hash h();
h.definekey('p');
h.definedone();
end;
do _N_=1 by 1 until (last.Customer);
set B;
by Customer;
end;
set A;
do until (c=count);
p=rand('integer', 1, _N_);
if h.check() ne 0 then do;
n=_iorc_+p;
set B point=n;
output;
c+1;h.ref();
end;
end;
h.clear();
_iorc_+count+1;
c=0;
run;
This is the simplest though
proc surveyselect data=B sampsize=A(rename=(count=_NSIZE_))
out=want(keep=Customer customerno sales) method=srs noprint;
strata Customer;
run;
@PeterClemmensen wrote:
This is the simplest though
proc surveyselect data=B sampsize=A(rename=(count=_NSIZE_)) out=want(keep=Customer customerno sales) method=srs noprint; strata Customer; run;
And if the Customer value might repeat then add the Customerno to the strata statement.
Both data sets need to be sorted by the variables on the strata statement.
Thanks All for the response...i get the solution already
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.