Randomly select a dataset observation and insert it into a second dataset

Reply
Contributor kbk
Contributor
Posts: 29

Randomly select a dataset observation and insert it into a second dataset

Hi All,

Hopefully this makes sense. I have two datasets, one of which has some null values for a specific variable. For those null variables, I want to randomly select an observation from a second dataset and insert it into the first dataset. I do a bit of other processing which I include below but which isn't critical to the core question. Here is code that does what I want it to do *except* I am hardcoding the variable - 'swap' - to be swapped in. How can I randomly select the swap variable used in the third data step below?

data work.swapIn;

    input swap $1-50;

    datalines;

    [sampleID]^1^^3^Street^Town^[sampleID prefix]

    [sampleID]^8^2^3^Street^Town^[sampleID prefix]

    [sampleID]^1^3^43^Street^Town^[sampleID prefix]

    ;

proc print;

run;

data work.input;

    length sampleID $11 Type 3 dataString $50;

    input sampleID $1-11 Type dataString $;

    datalines;

MS-1xxx-003 1 MS-1xxx-003^1x^^5^Street^Town^MS

MS-1yyy-411 2 .

MS-4xxx-002 3 .

GC-6xxx-002 3 GC-6xxx-002^1^3^43^Street^Town^GC

MS-2yyy-002 1 .

;

proc print;

run;

data swapOut (drop=swap samplePrefix);

    set input;

    if dataString = '' then

        do;

            /*instead of hard coding swap, I want to randomly select a swap value from work.swapIn*/

            swap = '[sampleID]^1^^3^Street^Town^[sampleID prefix]';

            dataString = tranwrd(swap,'[sampleID]', sampleID); *insert sampleID ;

            samplePrefix = scan(sampleID,1,'-'); *retrieve first part of sampleID;

            dataString = tranwrd(dataString,'[sampleID prefix]', samplePrefix); *insert samplePrefix;

        end;

run;

proc print data=swapOut;

run;

I tried reading the values of swapIn into macro variables:

data _null_;

    set swapIn;

    call symputx('swap'||put(_n_,z2.),swap,'G');

run;

%put &swap01;

With the thought that I would dynamically generate the macro variable name (where the counter suffix was randomly generated within the datastep). However, I ran into deadends when I could not concatenate a string variable to a macro variable to create a callable macro variable. I also tried storing the randomly generated suffix into a macro variable but discovered you cannot evaluate functions or equations when setting a macro variable. Here is the equation I was using to generate the random suffix:

swapSuffix = put(int(3*RAND('UNIFORM'))+1, z2.);

Super Contributor
Posts: 578

Re: Randomly select a dataset observation and insert it into a second dataset

If you add some variables to hold random values at the outset, then you can join based on them.

data work.swapIn;

    input rnd_start rnd_end swap & $50.;

    datalines;

0 .33 [sampleID]^1^^3^Street^Town^[sampleID prefix]

.33 .66 [sampleID]^8^2^3^Street^Town^[sampleID prefix]

.66 1 [sampleID]^1^3^43^Street^Town^[sampleID prefix]

    ;

data work.input;

    length sampleID $11 Type 3 dataString $50;

    input sampleID $1-11 Type dataString $;

    rndval=ranuni(0);

    datalines;

MS-1xxx-003 1 MS-1xxx-003^1x^^5^Street^Town^MS

MS-1yyy-411 2 .

MS-4xxx-002 3 .

GC-6xxx-002 3 GC-6xxx-002^1^3^43^Street^Town^GC

MS-2yyy-002 1 .

;

run;

proc sql;

create table want as

select   

    t1.sampleid

    ,t1.type

    ,coalesce(t1.datastring,t2.swap) as datastring

from

    input t1

    inner join swapin t2

        on t1.rndval >= t2.rnd_start

            and t1.rndval < t2.rnd_end;

quit;

Contributor kbk
Contributor
Posts: 29

Re: Randomly select a dataset observation and insert it into a second dataset

Thanks! I only slightly modified the first data step to automatically generate the rnd_start and rnd_end intervals.

data work.swapIn;

    input swap & $50.;

    datalines;

[sampleID]^1^^3^Street^Town^[sampleID prefix]

[sampleID]^8^2^3^Street^Town^[sampleID prefix]

[sampleID]^1^3^43^Street^Town^[sampleID prefix]

;

%let dsid=%sysfunc(open(swapIn));

%let num=%sysfunc(attrn(&dsid,nlobs)); /*number of observations in swapIn*/

%let rc=%sysfunc(close(&dsid));

data work.swapIn;

    set swapIn;

    if _N_ = 1 then rnd_start = 0;

    else rnd_start = rnd_end;

    rnd_end = 1/&num + rnd_start;

    retain rnd_end;

run;

Super Contributor
Posts: 578

Re: Randomly select a dataset observation and insert it into a second dataset

If you'll mark the question as answered, then others might not try to answer it.

Contributor kbk
Contributor
Posts: 29

Re: Randomly select a dataset observation and insert it into a second dataset

Yeah. I was going to leave it open for a bit longer to see if there are any other approaches.  I'll close it before end of business today.

Super User
Posts: 5,083

Re: Randomly select a dataset observation and insert it into a second dataset

OK, one more approach.  Assuming you already created the data set SWAPLN containing just the variable SWAP:

data want;

   set input;

   if dataString=' ' then do;

      recno = ceil(n_swaplines * ranuni(12345));

      set swapln point=recno nobs=n_swaplines;

   end;

   drop recno;

run;

If you want to do further processing with the randomly-read line, I leave that part to you.

Good luck.

Contributor kbk
Contributor
Posts: 29

Re: Randomly select a dataset observation and insert it into a second dataset

This looks promising but when I run the code I get unexpected output.

data work.swapIn;

    input swap $1-50;

    datalines;

[sampleID]^1^^3^Street^Town^[sampleID prefix]

[sampleID]^8^2^3^Street^Town^[sampleID prefix]

[sampleID]^1^3^43^Street^Town^[sampleID prefix]

    ;

proc print;

run;

data work.input;

    length sampleID $11 Type 3 dataString $50;

    input sampleID $1-11 Type dataString $;

    datalines;

MS-1xxx-003 1 MS-1xxx-003^1x^^5^Street^Town^MS

MS-1yyy-411 2 .

MS-4xxx-002 3 .

GC-6xxx-002 3 GC-6xxx-002^1^3^43^Street^Town^GC

MS-2yyy-002 1 .

;

proc print;

run;

data want ;

   set input;

   if dataString='' then

    do;

      recno = ceil(n_swaplines * ranuni(545));

      set swapIn point=recno nobs=n_swaplines;

     end;

   drop recno;

run;

proc print data=want;

run;

obssampleIDTypedataStringswap
1 MS-1xxx-003  1 MS-1xxx-003^1x^^5^Street^Town^MS
2 MS-1yyy-411  2

[sampleID]^1^3^43^Street^Town^[sampleID prefix]

3 MS-4xxx-002  3

[sampleID]^1^3^43^Street^Town^[sampleID prefix]

4 GC-6xxx-002  3 GC-6xxx-002^1^3^43^Street^Town^GC

[sampleID]^1^3^43^Street^Town^[sampleID prefix]

5 MS-2yyy-002  1

[sampleID]^1^3^43^Street^Town^[sampleID prefix]

It's almost like the if statement isn't obeyed. Which the debugger confirms it is but Obs 4 shouldn't get any values.

Super User
Posts: 5,083

Re: Randomly select a dataset observation and insert it into a second dataset

Ah, an oversight.  Variables that come from a SAS data set (such as SWAP from SWAPLN) are automatically retained.  So, add a line:

...

end;

else swap=' ';

drop recno;

Ask a Question
Discussion stats
  • 7 replies
  • 359 views
  • 6 likes
  • 3 in conversation