Hi,
I am trying to assign a randomly selected date to observations in data set A from another list of possible dates in data set B. However, for some observations in data set A, there is another date variable and I want the randomly selected date (with replacement) from data set B to be at least 30 days before the observed data in data set A. Below is how the two data sets (A & B) look and the want data set C:
Data set A
ID date1
1 .
2 02JAN2018
3 .
4 31JUL2000
5 .
Data set B
date2
30JUL2019
03MAR1999
09SEP2016
01JAN2017
Want Data set C
ID date1 date2
1 . 01JAN2017
2 02JAN2018 03MAR1999
3 . 30JUL2019
4 31JUL2000 03MAR1999
5 . 01JAN2017
How could I do this?
Greatly appreciate your help. Regards, -Raj
If you want all obs from dataset A, with missing values for date2 when there is no candidate from dataset B, change the code to:
proc sql;
create table want(drop=rnd) as
select ID, date1, date2, ranuni(767667) as rnd
from A left join B
on -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
select * from want;
quit;
Dealing with random numbers in SQL is always a bit tricky, but here goes:
proc sql;
create table want(drop=rnd) as
select ID, date1, date2, rand("uniform") as rnd
from A, B
where A.date1 <= intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
quit;
Note: I used the fact that missing values are always inferior to non-missing values in SAS (including SAS/SQL).
Hi PG,
Thank you for a quick response.
I ran your code for my data, but it is not working as per my need. The observations where there is "date1" in data set A, I would like to have the randomly selected date (from data set B) to be at least 30 days before the "date1."
I am also loosing observations from data set A, which should not be the case if the random selection is with replacement.
Could you provide a representative example of those problem cases?
Here is the screenshot of the data set I get after running the code.
ID is ID from data set A
DOD is date1 from data set A
KC_Dx_Dt is date2 which should be randomly selected from data set B.
For ex., for ID=3 or 4, the condition that date2 (KC_Dx_Dt) should be at least 30 day before date1 (DOD) is not meeting.
Also, I had 100 observations in data set A, however, here in the want data set, I have 98 observations.
So, I had gotten it backwards... Try this instead:
proc sql;
create table want(drop=rnd) as
select ID, date1, date2, rand("uniform") as rnd
from A, B
where -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
quit;
(here again, handling comparisons involving missing values requires a bit of gymnastics. I use the fact that -Missing = Missing < Non-missing)
Thank you. This code is picking/assigning a date2 which meets the condition, however, I am still loosing observations. Now have 95 observations instead of 100.
Revised Response:
This is code is working fine. I have also figured out the reason why I was having lesser number of observations in my "want" data set. It was because there was no "date" from the "data set B" which could have satisfied the condition for a date value in data set A (date1).
Thank you, PG!
If you want all obs from dataset A, with missing values for date2 when there is no candidate from dataset B, change the code to:
proc sql;
create table want(drop=rnd) as
select ID, date1, date2, ranuni(767667) as rnd
from A left join B
on -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
select * from want;
quit;
How could I add a "seed" number?
As far as I can tell, you can't give a seed to the rand function in SQL. You could replace rand("uniform") with ranuni(767667) where 767667 is the seed.
Note that ranuni is a deprecated function. From the SAS documentation:
"The [RANUNI] function is suitable for small samples and for applications that do not require a sophisticated random-number generator."
Thank you for the info and your help!
Regards, -Raj
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.