BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
raj00728
Fluorite | Level 6

Hi, 

 

I am trying to assign a randomly selected date to observations in data set A from another list of possible dates in data set B. However, for some observations in data set A, there is another date variable and I want the randomly selected date (with replacement) from data set B to be at least 30 days before the observed data in data set A. Below is how the two data sets (A & B) look and the want data set C:

 

Data set A

ID     date1

1         .

2        02JAN2018

3        .

4        31JUL2000

5        .

 

Data set B

date2

30JUL2019

03MAR1999

09SEP2016

01JAN2017

 

Want Data set C

ID     date1                 date2

1         .                         01JAN2017

2        02JAN2018        03MAR1999

3        .                          30JUL2019

4        31JUL2000         03MAR1999

5        .                          01JAN2017

 

How could I do this?

 

Greatly appreciate your help.  Regards, -Raj

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

If you want all obs from dataset A, with missing values for date2 when there is no candidate from dataset B, change the code to:

 

proc sql;
create table want(drop=rnd) as
select ID, date1, date2, ranuni(767667) as rnd
from A left join B
on -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
select * from want;
quit;
PG

View solution in original post

10 REPLIES 10
PGStats
Opal | Level 21

Dealing with random numbers in SQL is always a bit tricky, but here goes:

 

proc sql;
create table want(drop=rnd) as
select ID, date1, date2, rand("uniform") as rnd
from A, B
where A.date1 <= intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
quit;

Note: I used the fact that missing values are always inferior to non-missing values in SAS (including SAS/SQL).

PG
raj00728
Fluorite | Level 6

Hi PG,

 

Thank you for a quick response.

 

I ran your code for my data, but it is not working as per my need. The observations where there is "date1" in data set A, I would like to have the randomly selected date (from data set B) to be at least 30 days before the "date1."

I am also loosing observations from data set A, which should not be the case if the random selection is with replacement. 

PGStats
Opal | Level 21

Could you provide a representative example of those problem cases?

PG
raj00728
Fluorite | Level 6

Here is the screenshot of the data set I get after running the code.

 

ID is ID from data set A

DOD is date1 from data set A

KC_Dx_Dt is date2 which should be randomly selected from data set B.

 

For ex., for ID=3 or 4, the condition that date2 (KC_Dx_Dt) should be at least 30 day before date1 (DOD) is not meeting.

 

Also, I had 100 observations in data set A, however, here in the want data set, I have 98 observations. 

Picture1.jpg

 

PGStats
Opal | Level 21

So, I had gotten it backwards... Try this instead:

 

proc sql;
create table want(drop=rnd) as
select ID, date1, date2, rand("uniform") as rnd
from A, B
where -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
quit;

(here again, handling comparisons involving missing values requires a bit of gymnastics. I use the fact that -Missing = Missing < Non-missing)

PG
raj00728
Fluorite | Level 6

Thank you. This code is picking/assigning a date2 which meets the condition, however, I am still loosing observations. Now have 95 observations instead of 100.

 

Revised Response:

 

This is code is working fine. I have also figured out the reason why I was having lesser number of observations in my "want" data set. It was because there was no "date" from the "data set B" which could have satisfied the condition for a date value in data set A (date1).

 

Thank you, PG! 

PGStats
Opal | Level 21

If you want all obs from dataset A, with missing values for date2 when there is no candidate from dataset B, change the code to:

 

proc sql;
create table want(drop=rnd) as
select ID, date1, date2, ranuni(767667) as rnd
from A left join B
on -date1 <= -intnx("day", date2, 30)
group by ID
having rnd = min(rnd);
select * from want;
quit;
PG
PGStats
Opal | Level 21

As far as I can tell, you can't give a seed to the rand function in SQL. You could replace rand("uniform") with ranuni(767667) where 767667 is the seed.

 

Note that ranuni is a deprecated function. From the SAS documentation:

 

"The [RANUNI] function is suitable for small samples and for applications that do not require a sophisticated random-number generator."

PG
raj00728
Fluorite | Level 6

Thank you for the info and your help!

 

Regards, -Raj

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2104 views
  • 0 likes
  • 2 in conversation