BookmarkSubscribeRSS Feed
mayasak
Quartz | Level 8

I have 2 data sets (below are partial datasets) for persons that are exposed to an "xyz" disease and want to find the proportion of those who were tested within 5-10 days from there exposure:

 

data Exp;
input Subject_ID Exp_Date : mmddyy10. Name $14-22 Case_ID Exposure $;
format exp_date date9.;
datalines;
1 12/20/2019 Joe Smith 122 Home
1 09/30/2019 Joe Smith 435 Home
2 01/05/2020 Mary Mill 432 Work
3 11/04/2019 John Doe 129 Work
3 11/04/2019 John Doe 655 School
3 01/25/2020 John Doe 721 Work
;;;
run;

 

data Test;
input Subject_ID Test_Date : mmddyy10. Name $14-23;
format test_date date9.;
datalines;
1 12/29/2019 Joe Smith
2 09/30/2019 Mary Mill
2 01/07/2020 Mary Mill
8 06/30/2019 Mark Roger
8 07/15/2019 Mark Roger
8 08/29/2019 Mark Roger
8 12/03/2019 Mark Roger
;;;;
run;

 

This is what I've done:

I merged the 2 datasets using the following code:

 

Proc SQL;
Create Table Merge AS
Select
E.Exp_Date,E.Case_ID,E.Exposure,
coalesce(E.Subject_ID, T.Subject_ID) as Subject_ID,
T.Test_Date
from exp E FULL JOIN test T on (E.Subject_ID = T.Subject_ID);
quit;

 

Then I found the date difference between exposure and test dates:

 

data HW;
set merge;
days_diff = intck('day', exp_date, test_date);
run;

 

The result was as below. It shows an issue that the dates were filled in for matched persons such as Joe Smith and Mary Mill:

 

mayasak_1-1658206023186.png

My questions are:

  • How to get rid of the dates filled for the matched data.
  • How to get the proportion of those tested in the 5-10 days difference window. -Specifically the numerator and denominator.
  • Is there a better approach for getting the proportion?

Thank you

 

4 REPLIES 4
Renoux
Obsidian | Level 7

Hi!

 

I am not sure to understand all the words you use, but I will try to answer. 

 

[Question 1]

 

If you want to get rid of people who are in both work.Exp and work.Test, you add to your proc sql the condition [where E.Subject_ID=. or T.Subject_ID=.] :

Proc SQL;
Create Table Merge2 AS
Select
E.Exp_Date,E.Case_ID,E.Exposure,
coalesce(E.Subject_ID, T.Subject_ID) as Subject_ID,
T.Test_Date
from exp E FULL JOIN test T on (E.Subject_ID = T.Subject_ID)
where E.Subject_ID=. or T.Subject_ID=.;
quit;

if you want the opposite, to keep only subject who are in both table, you shall do an inner join instead of a full join :

Proc SQL;
Create Table Merge3 AS
Select
E.Exp_Date,E.Case_ID,E.Exposure,
coalesce(E.Subject_ID, T.Subject_ID) as Subject_ID,
T.Test_Date
from exp E INNER JOIN test T on (E.Subject_ID = T.Subject_ID);
quit;

Or left join for only subjects in table work.Exp or rigth join for only subjects in table work.Test.

 

[Question 2]

 

Is this the proportion you want ?

Proc SQL;
Create Table proportion AS
Select
mean(((days_diff>=5)*(days_diff<=10))) as proportion_in_5_to_10_days
from HW;
quit;

[Question 3]

There are different approachs to do this. "Better" approach depends on your point of view.

 

Regards.

 

Axel Renoux

mayasak
Quartz | Level 8

Thank you Axel,

For question 1, sorry for not being clear. What happened is:

Joe smith had exp_date from table one as 20DEC2019 and 30SEP2019 and only one test_date from table two as 29DEC2019. When I did the merge, there were two test_date for this person, both 29DEC2019 so he had two date_diff 9 and 90, and 90 is not a true date_diff. 

Mary Mill, on the other hand, had only one exp_date and two tests, but after the merge, she had 2 exp_date and diff_date 2 and -97.

What I needed is to have only the 9 and 2 diff_date.

One thought I have is to deduplicate the data only when there's no corresponding test_date for the exp_date and vice versa but I have no clue ob how to do it.

Thank you

Renoux
Obsidian | Level 7

To answer, I would need to know what information you would like in the following fictional case :

 

data Exp;
input Subject_ID Exp_Date : mmddyy10. Name $14-22 Case_ID Exposure $;
format exp_date date9.;
datalines;
9 12/20/2019 Mike Avery 122 Home
9 09/30/2019 Mike Avery 435 Home
;;;
run;

data Test;
input Subject_ID Test_Date : mmddyy10. Name $14-23;
format test_date date9.;
datalines;
9 10/29/2019 Mike Avery
9 01/07/2020 Mike Avery
;;;;
run;

 

Matching only (exp 12/20/2019) with (test 01/07/2020)? Or matching also (exp 09/30/2019) with (test 10/29/2019)? Or matching none of those two?

 

 

mayasak
Quartz | Level 8

In your factual example, we should match both because each time the person was exposed, he/she had a corresponding test after this exposure. But if a person had exposure with no corresponding test such as below, only 09/30/2019 and 10/29/2019 should be matched:

 

data Exp;
input Subject_ID Exp_Date : mmddyy10. Name $14-22 Case_ID Exposure $;
format exp_date date9.;
datalines;
9 12/20/2019 Mike Avery 122 Home
9 09/30/2019 Mike Avery 435 Home
;;;
run;

data Test;
input Subject_ID Test_Date : mmddyy10. Name $14-23;
format test_date date9.;
datalines;
9 10/29/2019 Mike Avery
;;;;
run;

 The same thing for those who had one exposure and two tests such as below, only 09/30/2019 and 10/29/2019 should be matched:

data Exp;
input Subject_ID Exp_Date : mmddyy10. Name $14-22 Case_ID Exposure $;
format exp_date date9.;
datalines;
9 09/30/2019 Mike Avery 435 Home
;;;
run;

data Test;
input Subject_ID Test_Date : mmddyy10. Name $14-23;
format test_date date9.;
datalines;
9 10/29/2019 Mike Avery
9 01/07/2020 Mike Avery
;;;;
run;

 Thank you

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1041 views
  • 0 likes
  • 2 in conversation