Solved: How to Randomly Pick a Value from Other Table

Balli · Posted 06-29-2025 10:24 PM

Hi Team,

Looking for some help here :

I've got two SQL tables

Table A: It contains a state and Phone numbers. Each state has got around 20 different numbers.

Table B : This table got accounts, customers and states . It has around 40K accounts.

I want to randomly assign a phone number from table A to each account in table B, based on their state.

I think this can be achieved using first and last variables but unable to come up with a solution. Will be great help if someone can please provide me some guidance here.

Here is a rough idea of what the tables contain.

Table A:

Table B:

mkeintz · Posted 06-30-2025 08:49 PM

You have a small dataset of phone numbers by state, and a "large" dataset of accounts. I understand that you are fine with randomly assigning a given phone number to multiple qualifying accounts. Here's a program that avoids the need to sort the large dataset just to facilitate a merge:

Using @Ksharp's sample data:

data phone_arrays (keep=state _nphones _col:);
  do _nphones=1 by 1 until (last.state);
    set tablea  (where=(not missing(phone)));
    by state;
    array _col {50};
    _col{_nphones}=phone;
  end;
run;

data want (drop=_:);
  set tableb;
  if _n_=1 then do;
    if 0 then set phone_arrays ;
    declare hash h (dataset:'phone_arrays');
      h.definekey('state'); 
      h.definedata(all:'Y');
      h.definedone();
  end;
  array col {*} _col: ;
  
  if h.find()=0 then phnum=col{ceil(_nphones*ranuni(1508915))};
run;

The _COL array is given an arbitrary size -- large enough to account for the largest group of available phone numbers.

This assumes the TABLEA dataset is already sorted by state.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Tom · Posted 06-29-2025 10:39 PM

You could just add a random number to the phone number dataset and sort by STATE and the new variable with the random number to get the phone numbers in random order.

For a more detailed answer we probably need more information.

Do you want each account and each customer to have their own phone number? Or do you need to use both account and customer together, like a combination key, to assign a phone number?

Do you want each account/customer pair to get a different number?

Are there more accounts/customers for any state than the number of phone numbers you have for that state? If so what do you want to do then? Is it ok if two account/customer get the same number?

Balli · Posted 06-29-2025 11:23 PM

Hi Tom, thanks for your response. the problem is not about randomising the phone numbers, it is more about joining the two datasets. AccountIDs are unique and as I mentioned, there are around 40,00 accounts which means there will be many accounts per state and multiple accounts would have the same phone number.

Patrick · Posted 06-30-2025 03:42 AM

The approach Tom proposes is not that hard to implement. Please provide usable sample data if you're after code (=instead of screenshots two data steps that create the sample data).

andreas_lds · Posted 06-30-2025 01:31 AM

Please show the expected result making it easier to understand what you really want.

Ksharp · Posted 06-30-2025 05:27 AM

That would be a lot of help if you could post your data as a Data Step or plain text, not just a picture.

Nobody would like to type it for you and you would miss the solution from someone.

data tablea;
input state $ phone;
cards;
NSW 111
NSW 112
NSW 123
NSW 114
QLD 121
QLD 211
QLD 141
QLD 411
;
data tableb;
input accountid state $;
cards;
1 NSW
2 QLD
3 VIC
;

proc surveyselect data=tablea out=phone seed=123 sampsize=1;
strata state;
run;
proc sort data=phone;by state;run;
proc sort data=tableb;by state;run;
data want;
 merge tableb(in=ina) phone;
 by state;
 if ina;
run;

Ksharp · Posted 06-30-2025 05:50 AM

If it is a replaced sampling ,and each accountid from the same state could have different phone.

Try this one :

data tablea;
input state $ phone;
cards;
NSW 111
NSW 112
NSW 123
NSW 114
QLD 121
QLD 211
QLD 141
QLD 411
;
data tableb;
input accountid state $;
cards;
1 NSW
8 NSW
2 QLD
4 QLD
5 QLD
3 VIC
;

proc sql;
create table SampleSize as
select state,count(*) as SampleSize
 from tableb
  group by state;
quit;

proc surveyselect data=tablea out=phone seed=123 sampsize=SampleSize selectall;
strata state;
run;
proc sort data=phone;by state;run;
proc sort data=tableb;by state;run;
data want;
 merge tableb(in=ina) phone;
 by state;
 if ina;
run;

FreelanceReinh · Posted 06-30-2025 06:30 AM

@Ksharp wrote:

proc surveyselect data=tablea out=phone seed=123 sampsize=SampleSize selectall;
strata state;
run;

Great idea, @Ksharp, to use PROC SURVEYSELECT, but I'm pretty sure you actually wanted to use the options

method=urs outhits outrandom

instead of selectall.

FreelanceReinh · Posted 06-30-2025 08:07 AM

Hi @Balli,

@Balli wrote:

(...)

I think this can be achieved using first and last variables ...

Here is an approach using the FIRST.STATE and LAST.STATE variables: The observation numbers of the first and last observation of each state in dataset HAVE_A (assuming they are grouped) are stored in a temporary lookup table (hash object). While reading dataset HAVE_B they are retrieved and a (uniformly distributed) random number between them is selected using the RAND function. This random number, in turn, is used in the POINT= option of the SET statement retrieving a phone number from dataset HAVE_A.

/* Create sample data for demonstration */

%let states='NSW','QLD','VIC','WA','ACT','NT','SA','TAS';

data have_A;
do State=&states;
  do _n_=1 to 23-mod(rank(md5(State)),7);
    Phone=ranuni(2718)*(2**31-1);
    output;
  end;
end;
run;

data have_B;
call streaminit(27182818);
do _n_=1 to 40000;
  AccountID=ranuni(3142)*(2**31-1);
  length State $3;
  State=choosec(rand('integer',8),&states);
  output;
end;
run;

proc sql;
insert into have_B
values(123456789,'XYZ');
quit;


/* Assign a randomly selected phone number from have_A to each account in have_B, based on their state */

data want(drop=n1 n2);
call streaminit(31415927);
if _n_=1 then do;
  if 0 then set have_B have_A;
  dcl hash h(); /* lookup table for the first and last obs. number per state in have_A */
  h.definekey('State');
  h.definedata('n1','n2');
  h.definedone();
  do _n_=1 by 1 until(last);
    set have_A end=last;
    by state notsorted; /* Dataset have_A must be grouped by State. */
    if first.state then n1=_n_;
    if last.state then do;
      n2=_n_;
      h.add();
    end;
  end;
end;
set have_b;
if h.find()=0 then do;
  _n_=rand('integer',n1,n2);
  set have_A(keep=phone) point=_n_;
end;
else call missing(phone);
run;

Alternatively, you could store the phone numbers in the hash object and retrieve them from there (e.g., using a sequence number as a second key item).

But I think a solution using PROC SURVEYSELECT (@Ksharp's idea) is easier to understand.

Edit: Inserted the ELSE statement at the end of the program to avoid retaining a phone number from the previous observation if an unexpected value of State occurs.

mkeintz · Posted 06-30-2025 08:49 PM

You have a small dataset of phone numbers by state, and a "large" dataset of accounts. I understand that you are fine with randomly assigning a given phone number to multiple qualifying accounts. Here's a program that avoids the need to sort the large dataset just to facilitate a merge:

Using @Ksharp's sample data:

data phone_arrays (keep=state _nphones _col:);
  do _nphones=1 by 1 until (last.state);
    set tablea  (where=(not missing(phone)));
    by state;
    array _col {50};
    _col{_nphones}=phone;
  end;
run;

data want (drop=_:);
  set tableb;
  if _n_=1 then do;
    if 0 then set phone_arrays ;
    declare hash h (dataset:'phone_arrays');
      h.definekey('state'); 
      h.definedata(all:'Y');
      h.definedone();
  end;
  array col {*} _col: ;
  
  if h.find()=0 then phnum=col{ceil(_nphones*ranuni(1508915))};
run;

The _COL array is given an arbitrary size -- large enough to account for the largest group of available phone numbers.

This assumes the TABLEA dataset is already sorted by state.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Balli · Posted 07-01-2025 09:37 PM

Thanks. appreciate your response. This is very quick and elegant solution. Exactly what I was after 🙂

Balli · Posted 07-01-2025 09:41 PM

Thanks everyone for the response and sorry for not including the sample data and the expected results. I will try to include more details next time.

How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Re: How to Randomly Pick a Value from Other Table

Registration is open

Registration is open

SAS Training: Just a Click Away