BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
raj00728
Fluorite | Level 6

Hi, 

 

I am trying to assign a randomly selected date to observations in data set A from another list of possible dates in data set B. However, for some observations in data set A, there is another date variable and I want the randomly selected date (with replacement) from data set B to be at least 30 days before the observed data in data set A. Below is how the two data sets (A & B) look and the want data set C:

 

Data set A

ID     date1

1         .

2        02JAN2018

3        .

4        31JUL2000

5        .

 

Data set B

date2

30JUL2019

03MAR1999

09SEP2016

01JAN2017

 

Want Data set C

ID     date1                 date2

1         .                         01JAN2017

2        02JAN2018        03MAR1999

3        .                          30JUL2019

4        31JUL2000         03MAR1999

5        .                          01JAN2017

 

How could I do this?

 

Greatly appreciate your help.  Regards, -Raj

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

If you want all obs from dataset A, with missing values for date2 when there is no candidate from dataset B, change the code to:

 

proc sql;
create table want(drop=rnd) as
select ID, date1, date2, ranuni(767667) as rnd
from A left join B
on -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
select * from want;
quit;
PG

View solution in original post

10 REPLIES 10
PGStats
Opal | Level 21

Dealing with random numbers in SQL is always a bit tricky, but here goes:

 

proc sql;
create table want(drop=rnd) as
select ID, date1, date2, rand("uniform") as rnd
from A, B
where A.date1 <= intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
quit;

Note: I used the fact that missing values are always inferior to non-missing values in SAS (including SAS/SQL).

PG
raj00728
Fluorite | Level 6

Hi PG,

 

Thank you for a quick response.

 

I ran your code for my data, but it is not working as per my need. The observations where there is "date1" in data set A, I would like to have the randomly selected date (from data set B) to be at least 30 days before the "date1."

I am also loosing observations from data set A, which should not be the case if the random selection is with replacement. 

PGStats
Opal | Level 21

Could you provide a representative example of those problem cases?

PG
raj00728
Fluorite | Level 6

Here is the screenshot of the data set I get after running the code.

 

ID is ID from data set A

DOD is date1 from data set A

KC_Dx_Dt is date2 which should be randomly selected from data set B.

 

For ex., for ID=3 or 4, the condition that date2 (KC_Dx_Dt) should be at least 30 day before date1 (DOD) is not meeting.

 

Also, I had 100 observations in data set A, however, here in the want data set, I have 98 observations. 

Picture1.jpg

 

PGStats
Opal | Level 21

So, I had gotten it backwards... Try this instead:

 

proc sql;
create table want(drop=rnd) as
select ID, date1, date2, rand("uniform") as rnd
from A, B
where -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
quit;

(here again, handling comparisons involving missing values requires a bit of gymnastics. I use the fact that -Missing = Missing < Non-missing)

PG
raj00728
Fluorite | Level 6

Thank you. This code is picking/assigning a date2 which meets the condition, however, I am still loosing observations. Now have 95 observations instead of 100.

 

Revised Response:

 

This is code is working fine. I have also figured out the reason why I was having lesser number of observations in my "want" data set. It was because there was no "date" from the "data set B" which could have satisfied the condition for a date value in data set A (date1).

 

Thank you, PG! 

PGStats
Opal | Level 21

If you want all obs from dataset A, with missing values for date2 when there is no candidate from dataset B, change the code to:

 

proc sql;
create table want(drop=rnd) as
select ID, date1, date2, ranuni(767667) as rnd
from A left join B
on -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
select * from want;
quit;
PG
PGStats
Opal | Level 21

As far as I can tell, you can't give a seed to the rand function in SQL. You could replace rand("uniform") with ranuni(767667) where 767667 is the seed.

 

Note that ranuni is a deprecated function. From the SAS documentation:

 

"The [RANUNI] function is suitable for small samples and for applications that do not require a sophisticated random-number generator."

PG
raj00728
Fluorite | Level 6

Thank you for the info and your help!

 

Regards, -Raj

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3394 views
  • 0 likes
  • 2 in conversation