Hi,
I am trying to assign a randomly selected date to observations in data set A from another list of possible dates in data set B. However, for some observations in data set A, there is another date variable and I want the randomly selected date (with replacement) from data set B to be at least 30 days before the observed data in data set A. Below is how the two data sets (A & B) look and the want data set C:
Data set A
ID date1
1 .
2 02JAN2018
3 .
4 31JUL2000
5 .
Data set B
date2
30JUL2019
03MAR1999
09SEP2016
01JAN2017
Want Data set C
ID date1 date2
1 . 01JAN2017
2 02JAN2018 03MAR1999
3 . 30JUL2019
4 31JUL2000 03MAR1999
5 . 01JAN2017
How could I do this?
Greatly appreciate your help. Regards, -Raj
If you want all obs from dataset A, with missing values for date2 when there is no candidate from dataset B, change the code to:
proc sql;
create table want(drop=rnd) as
select ID, date1, date2, ranuni(767667) as rnd
from A left join B
on -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
select * from want;
quit;
Dealing with random numbers in SQL is always a bit tricky, but here goes:
proc sql;
create table want(drop=rnd) as
select ID, date1, date2, rand("uniform") as rnd
from A, B
where A.date1 <= intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
quit;
Note: I used the fact that missing values are always inferior to non-missing values in SAS (including SAS/SQL).
Hi PG,
Thank you for a quick response.
I ran your code for my data, but it is not working as per my need. The observations where there is "date1" in data set A, I would like to have the randomly selected date (from data set B) to be at least 30 days before the "date1."
I am also loosing observations from data set A, which should not be the case if the random selection is with replacement.
Could you provide a representative example of those problem cases?
Here is the screenshot of the data set I get after running the code.
ID is ID from data set A
DOD is date1 from data set A
KC_Dx_Dt is date2 which should be randomly selected from data set B.
For ex., for ID=3 or 4, the condition that date2 (KC_Dx_Dt) should be at least 30 day before date1 (DOD) is not meeting.
Also, I had 100 observations in data set A, however, here in the want data set, I have 98 observations.
So, I had gotten it backwards... Try this instead:
proc sql;
create table want(drop=rnd) as
select ID, date1, date2, rand("uniform") as rnd
from A, B
where -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
quit;
(here again, handling comparisons involving missing values requires a bit of gymnastics. I use the fact that -Missing = Missing < Non-missing)
Thank you. This code is picking/assigning a date2 which meets the condition, however, I am still loosing observations. Now have 95 observations instead of 100.
Revised Response:
This is code is working fine. I have also figured out the reason why I was having lesser number of observations in my "want" data set. It was because there was no "date" from the "data set B" which could have satisfied the condition for a date value in data set A (date1).
Thank you, PG!
If you want all obs from dataset A, with missing values for date2 when there is no candidate from dataset B, change the code to:
proc sql;
create table want(drop=rnd) as
select ID, date1, date2, ranuni(767667) as rnd
from A left join B
on -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
select * from want;
quit;
How could I add a "seed" number?
As far as I can tell, you can't give a seed to the rand function in SQL. You could replace rand("uniform") with ranuni(767667) where 767667 is the seed.
Note that ranuni is a deprecated function. From the SAS documentation:
"The [RANUNI] function is suitable for small samples and for applications that do not require a sophisticated random-number generator."
Thank you for the info and your help!
Regards, -Raj
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.