BookmarkSubscribeRSS Feed
mayasak
Quartz | Level 8

I have 2 data sets (below are partial datasets) for persons that are exposed to an "xyz" disease and want to find the proportion of those who were tested within 5-10 days from there exposure:

 

data Exp;
input Subject_ID Exp_Date : mmddyy10. Name $14-22 Case_ID Exposure $;
format exp_date date9.;
datalines;
1 12/20/2019 Joe Smith 122 Home
1 09/30/2019 Joe Smith 435 Home
2 01/05/2020 Mary Mill 432 Work
3 11/04/2019 John Doe 129 Work
3 11/04/2019 John Doe 655 School
3 01/25/2020 John Doe 721 Work
;;;
run;

 

data Test;
input Subject_ID Test_Date : mmddyy10. Name $14-23;
format test_date date9.;
datalines;
1 12/29/2019 Joe Smith
2 09/30/2019 Mary Mill
2 01/07/2020 Mary Mill
8 06/30/2019 Mark Roger
8 07/15/2019 Mark Roger
8 08/29/2019 Mark Roger
8 12/03/2019 Mark Roger
;;;;
run;

 

This is what I've done:

I merged the 2 datasets using the following code:

 

Proc SQL;
Create Table Merge AS
Select
E.Exp_Date,E.Case_ID,E.Exposure,
coalesce(E.Subject_ID, T.Subject_ID) as Subject_ID,
T.Test_Date
from exp E FULL JOIN test T on (E.Subject_ID = T.Subject_ID);
quit;

 

Then I found the date difference between exposure and test dates:

 

data HW;
set merge;
days_diff = intck('day', exp_date, test_date);
run;

 

The result was as below. It shows an issue that the dates were filled in for matched persons such as Joe Smith and Mary Mill:

 

mayasak_1-1658206023186.png

My questions are:

  • How to get rid of the dates filled for the matched data.
  • How to get the proportion of those tested in the 5-10 days difference window. -Specifically the numerator and denominator.
  • Is there a better approach for getting the proportion?

Thank you

 

4 REPLIES 4
Renoux
Obsidian | Level 7

Hi!

 

I am not sure to understand all the words you use, but I will try to answer. 

 

[Question 1]

 

If you want to get rid of people who are in both work.Exp and work.Test, you add to your proc sql the condition [where E.Subject_ID=. or T.Subject_ID=.] :

Proc SQL;
Create Table Merge2 AS
Select
E.Exp_Date,E.Case_ID,E.Exposure,
coalesce(E.Subject_ID, T.Subject_ID) as Subject_ID,
T.Test_Date
from exp E FULL JOIN test T on (E.Subject_ID = T.Subject_ID)
where E.Subject_ID=. or T.Subject_ID=.;
quit;

if you want the opposite, to keep only subject who are in both table, you shall do an inner join instead of a full join :

Proc SQL;
Create Table Merge3 AS
Select
E.Exp_Date,E.Case_ID,E.Exposure,
coalesce(E.Subject_ID, T.Subject_ID) as Subject_ID,
T.Test_Date
from exp E INNER JOIN test T on (E.Subject_ID = T.Subject_ID);
quit;

Or left join for only subjects in table work.Exp or rigth join for only subjects in table work.Test.

 

[Question 2]

 

Is this the proportion you want ?

Proc SQL;
Create Table proportion AS
Select
mean(((days_diff>=5)*(days_diff<=10))) as proportion_in_5_to_10_days
from HW;
quit;

[Question 3]

There are different approachs to do this. "Better" approach depends on your point of view.

 

Regards.

 

Axel Renoux

mayasak
Quartz | Level 8

Thank you Axel,

For question 1, sorry for not being clear. What happened is:

Joe smith had exp_date from table one as 20DEC2019 and 30SEP2019 and only one test_date from table two as 29DEC2019. When I did the merge, there were two test_date for this person, both 29DEC2019 so he had two date_diff 9 and 90, and 90 is not a true date_diff. 

Mary Mill, on the other hand, had only one exp_date and two tests, but after the merge, she had 2 exp_date and diff_date 2 and -97.

What I needed is to have only the 9 and 2 diff_date.

One thought I have is to deduplicate the data only when there's no corresponding test_date for the exp_date and vice versa but I have no clue ob how to do it.

Thank you

Renoux
Obsidian | Level 7

To answer, I would need to know what information you would like in the following fictional case :

 

data Exp;
input Subject_ID Exp_Date : mmddyy10. Name $14-22 Case_ID Exposure $;
format exp_date date9.;
datalines;
9 12/20/2019 Mike Avery 122 Home
9 09/30/2019 Mike Avery 435 Home
;;;
run;

data Test;
input Subject_ID Test_Date : mmddyy10. Name $14-23;
format test_date date9.;
datalines;
9 10/29/2019 Mike Avery
9 01/07/2020 Mike Avery
;;;;
run;

 

Matching only (exp 12/20/2019) with (test 01/07/2020)? Or matching also (exp 09/30/2019) with (test 10/29/2019)? Or matching none of those two?

 

 

mayasak
Quartz | Level 8

In your factual example, we should match both because each time the person was exposed, he/she had a corresponding test after this exposure. But if a person had exposure with no corresponding test such as below, only 09/30/2019 and 10/29/2019 should be matched:

 

data Exp;
input Subject_ID Exp_Date : mmddyy10. Name $14-22 Case_ID Exposure $;
format exp_date date9.;
datalines;
9 12/20/2019 Mike Avery 122 Home
9 09/30/2019 Mike Avery 435 Home
;;;
run;

data Test;
input Subject_ID Test_Date : mmddyy10. Name $14-23;
format test_date date9.;
datalines;
9 10/29/2019 Mike Avery
;;;;
run;

 The same thing for those who had one exposure and two tests such as below, only 09/30/2019 and 10/29/2019 should be matched:

data Exp;
input Subject_ID Exp_Date : mmddyy10. Name $14-22 Case_ID Exposure $;
format exp_date date9.;
datalines;
9 09/30/2019 Mike Avery 435 Home
;;;
run;

data Test;
input Subject_ID Test_Date : mmddyy10. Name $14-23;
format test_date date9.;
datalines;
9 10/29/2019 Mike Avery
9 01/07/2020 Mike Avery
;;;;
run;

 Thank you

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 987 views
  • 0 likes
  • 2 in conversation