Solved: Re: How to find all the records of certain ID with a certain character...

Primavera · Posted 01-22-2020 05:22 PM

Hi,

I am using SAS 9.4 TS level 1M3. I have a data set where each ID can have multiple records. It looks like this:

ID	Epi_start	Epi_end	Epi-Number
1	21-Aug-10	25-Aug-10	1
2	27-Aug-10	10-Sep-10	1
3	13-Sep-10	18-Sep-10	1
3	01-Oct-10	11-Oct-10	2
3	14-Oct-10	19-Oct-10	3
3	15-Dec-12	18-Dec-10	1
4	18-Oct-10	22-Oct-10	1
4	18-Oct-10	22-Oct-10	2
5	19-Oct-10	29-Oct-10	1
5	13-Dec-10	01-Jan-11	2
5	13-Dec-10	01-Jan-11	3
5	16-Feb-11	28-Feb-11	4
6	01-Dec-10	09-Dec-10	1
7	11-Dec-10	17-Dec-10	1
7	15-Oct-11	22-Oct-11	1
7	01-Dec-11	10-Dec-11	2
7	20-Jun-12	29-Jun-12	1
7	29-Jul-12	08-Aug-12	2
7	07-Sep-12	16-Sep-12	3
8	19-Dec-10	25-Dec-10	1
8	19-Dec-10	25-Dec-10	2
8	19-Dec-10	25-Dec-10	3
8	17-Jan-13	25-Jan-13	1
9	01-Mar-11	10-Mar-11	1
9	25-Mar-11	30-Mar-11	2

I want to transfer all the records of individuals who have even one identical Epi_start and Epi_end to a new file. Basically I want to get this file:

ID	Epi_start	Epi_end	Epi-Number
4	18-Oct-10	22-Oct-10	1
4	18-Oct-10	22-Oct-10	2
5	19-Oct-10	29-Oct-10	1
5	13-Dec-10	01-Jan-11	2
5	13-Dec-10	01-Jan-11	3
5	16-Feb-11	28-Feb-11	4
8	19-Dec-10	25-Dec-10	1
8	19-Dec-10	25-Dec-10	2
8	19-Dec-10	25-Dec-10	3
8	17-Jan-13	25-Jan-13	1

I tried a number of ways myself but being new to SAS I did not get the correct results. Thanks for the help.

novinosrin · Posted 01-22-2020 06:54 PM

Slight tweak all that you need is

 and max(hospital));

Full version



data have;
input ID	(Epi_start	Epi_end) (:Date9.)	Epi_Number	Hospital;
cards;
1	21-Aug-10	25-Aug-10	1	.
2	27-Aug-10	10-Sep-10	1	1
3	13-Sep-10	18-Sep-10	1	.
3	01-Oct-10	11-Oct-10	2	.
3	14-Oct-10	19-Oct-10	3	.
3	15-Dec-12	18-Dec-10	1	1
4	18-Oct-10	22-Oct-10	1	.
4	18-Oct-10	22-Oct-10	2	.
5	19-Oct-10	29-Oct-10	1	.
5	13-Dec-10	01-Jan-11	2	.
5	13-Dec-10	01-Jan-11	3	1
5	16-Feb-11	28-Feb-11	4	.
6	01-Dec-10	09-Dec-10	1	.
7	11-Dec-10	17-Dec-10	1	.
7	15-Oct-11	22-Oct-11	1	.
7	01-Dec-11	10-Dec-11	2	1
7	20-Jun-12	29-Jun-12	1	1
7	29-Jul-12	08-Aug-12	2	.
7	07-Sep-12	16-Sep-12	3	.
8	19-Dec-10	25-Dec-10	1	.
8	19-Dec-10	25-Dec-10	2	1
8	19-Dec-10	25-Dec-10	3	.
8	17-Jan-13	25-Jan-13	1	.
9	01-Mar-11	10-Mar-11	1	1
9	25-Mar-11	30-Mar-11	2	1
;
 
proc sql;
create table want as
select *
from have
where id in (select id from have group by id,Epi_start,Epi_end having count(*)>1 and max(hospital));
quit;

SQL is so easy 🙂

View solution in original post

novinosrin · Posted 01-22-2020 05:31 PM

Easy in SQL


data have;
input ID	(Epi_start	Epi_end) (:date9.)	Epi_Number;
format Epi_start	Epi_end date9.;
cards;
1	21-Aug-10	25-Aug-10	1
2	27-Aug-10	10-Sep-10	1
3	13-Sep-10	18-Sep-10	1
3	01-Oct-10	11-Oct-10	2
3	14-Oct-10	19-Oct-10	3
3	15-Dec-12	18-Dec-10	1
4	18-Oct-10	22-Oct-10	1
4	18-Oct-10	22-Oct-10	2
5	19-Oct-10	29-Oct-10	1
5	13-Dec-10	01-Jan-11	2
5	13-Dec-10	01-Jan-11	3
5	16-Feb-11	28-Feb-11	4
6	01-Dec-10	09-Dec-10	1
7	11-Dec-10	17-Dec-10	1
7	15-Oct-11	22-Oct-11	1
7	01-Dec-11	10-Dec-11	2
7	20-Jun-12	29-Jun-12	1
7	29-Jul-12	08-Aug-12	2
7	07-Sep-12	16-Sep-12	3
8	19-Dec-10	25-Dec-10	1
8	19-Dec-10	25-Dec-10	2
8	19-Dec-10	25-Dec-10	3
8	17-Jan-13	25-Jan-13	1
9	01-Mar-11	10-Mar-11	1
9	25-Mar-11	30-Mar-11	2
;

proc sql;
create table want as
select *
from have
where id in (select id from have group by id,Epi_start,Epi_end having count(*)>1);
quit;

Primavera · Posted 01-22-2020 06:01 PM

Sorry, I failed to mention that the data set that I am working with is not identical to the one I posted. My original data set is a SAS data set with about 300000 records and about 20 variables. I am not sure if I can create the have table like your answer.

Also I am particularly bad at SQL sadly. Can you please explain the last line of your code (particularly count(*)>1 part of it)?

Many thanks,

novinosrin · Posted 01-22-2020 06:07 PM

The logic is very simple:

1. Subset ID's , whose records contains more than one occurrence of identical Epi_start Epi_end which has to make a count that is greater than one.

2. LooK up ID's from the full table to the ID's the resulting subset mentioned in 1

The output will have only those matching ID's . Does that explain enough to help you comprehend?

It doesn't matter how many records or variables you have as long as the variables of interest for the subsetting logic is ID, Epi_start Epi_end

Primavera · Posted 01-22-2020 06:44 PM

Thank you so much. It worked and I am beginning to understand it. May I ask one more question? Now assume that you have a hospital flag that is either 1 or missing. Like this:

ID	Epi_start	Epi_end	Epi-Number	Hospital
1	21-Aug-10	25-Aug-10	1	.
2	27-Aug-10	10-Sep-10	1	1
3	13-Sep-10	18-Sep-10	1	.
3	01-Oct-10	11-Oct-10	2	.
3	14-Oct-10	19-Oct-10	3	.
3	15-Dec-12	18-Dec-10	1	1
4	18-Oct-10	22-Oct-10	1	.
4	18-Oct-10	22-Oct-10	2	.
5	19-Oct-10	29-Oct-10	1	.
5	13-Dec-10	01-Jan-11	2	.
5	13-Dec-10	01-Jan-11	3	1
5	16-Feb-11	28-Feb-11	4	.
6	01-Dec-10	09-Dec-10	1	.
7	11-Dec-10	17-Dec-10	1	.
7	15-Oct-11	22-Oct-11	1	.
7	01-Dec-11	10-Dec-11	2	1
7	20-Jun-12	29-Jun-12	1	1
7	29-Jul-12	08-Aug-12	2	.
7	07-Sep-12	16-Sep-12	3	.
8	19-Dec-10	25-Dec-10	1	.
8	19-Dec-10	25-Dec-10	2	1
8	19-Dec-10	25-Dec-10	3	.
8	17-Jan-13	25-Jan-13	1	.
9	01-Mar-11	10-Mar-11	1	1
9	25-Mar-11	30-Mar-11	2	1

Now lets say I want the records where at least one of the identical epi_start and epi_end has a hospital flag of 1. Like table below:

ID	Epi_start	Epi_end	Epi-Number	Hospital
5	19-Oct-10	29-Oct-10	1	.
5	13-Dec-10	01-Jan-11	2	.
5	13-Dec-10	01-Jan-11	3	1
5	16-Feb-11	28-Feb-11	4	.
8	19-Dec-10	25-Dec-10	1	.
8	19-Dec-10	25-Dec-10	2	1
8	19-Dec-10	25-Dec-10	3	.
8	17-Jan-13	25-Jan-13	1	.

How would you add that to your code? Thank you so much again.

novinosrin · Posted 01-22-2020 06:54 PM

Slight tweak all that you need is

 and max(hospital));

Full version



data have;
input ID	(Epi_start	Epi_end) (:Date9.)	Epi_Number	Hospital;
cards;
1	21-Aug-10	25-Aug-10	1	.
2	27-Aug-10	10-Sep-10	1	1
3	13-Sep-10	18-Sep-10	1	.
3	01-Oct-10	11-Oct-10	2	.
3	14-Oct-10	19-Oct-10	3	.
3	15-Dec-12	18-Dec-10	1	1
4	18-Oct-10	22-Oct-10	1	.
4	18-Oct-10	22-Oct-10	2	.
5	19-Oct-10	29-Oct-10	1	.
5	13-Dec-10	01-Jan-11	2	.
5	13-Dec-10	01-Jan-11	3	1
5	16-Feb-11	28-Feb-11	4	.
6	01-Dec-10	09-Dec-10	1	.
7	11-Dec-10	17-Dec-10	1	.
7	15-Oct-11	22-Oct-11	1	.
7	01-Dec-11	10-Dec-11	2	1
7	20-Jun-12	29-Jun-12	1	1
7	29-Jul-12	08-Aug-12	2	.
7	07-Sep-12	16-Sep-12	3	.
8	19-Dec-10	25-Dec-10	1	.
8	19-Dec-10	25-Dec-10	2	1
8	19-Dec-10	25-Dec-10	3	.
8	17-Jan-13	25-Jan-13	1	.
9	01-Mar-11	10-Mar-11	1	1
9	25-Mar-11	30-Mar-11	2	1
;
 
proc sql;
create table want as
select *
from have
where id in (select id from have group by id,Epi_start,Epi_end having count(*)>1 and max(hospital));
quit;

SQL is so easy 🙂

Primavera · Posted 01-23-2020 02:21 PM

Thank you for replying again. I think there might be a problem with this additional code. If we have identical Epi_start and Epi_end that are not hospitalizations in one individual's records but then we also have one separate occasion of hospitalization (hospital=1) then wouldn't that set of records make it to our final list? I mean something like this:

ID	Epi_start	Epi_end	Epi-Number	Hospital
7	11-Dec-10	17-Dec-10	1	.
7	15-Oct-11	22-Oct-11	1	.
7	01-Dec-11	10-Dec-11	2	1
7	20-Jun-12	29-Jun-12	1	1
7	29-Jul-12	08-Aug-12	2	.
7	29-Jul-12	08-Aug-12	3	.

I might be wrong though. I have used SQL only once before (and it did not work that time). I am still not sure if I am getting your logic (both the count(*)>1 and max(hospital) sections). Thanks.

novinosrin · Posted 01-23-2020 02:34 PM

Hi @Primavera Yes, the current logic/logic so far will not pick ID7. The current logic approach is very simple:

1. Identify the number of identical occurrences of Epi_start and Epi_end for a given ID. If it is greater than one, it passes the 1st test

2. Use the result of 1 and determine if any of the records in the subset has hospitalization flag.

If the check passes both the tests, output all the records of those IDs. For ID7 in your latest post, though it passes the 1st test, it fails the 2nd, hence 7 gets dropped .

Please review your requirement thoroughly and let us know what you want in/as final. It's cool to go back and forth as long we hit the much needed target.

@Primavera wrote:

Thank you for replying again. I think there might be a problem with this additional code. If we have identical Epi_start and Epi_end that are not hospitalizations in one individual's records but then we also have one separate occasion of hospitalization (hospital=1) then wouldn't that set of records make it to our final list? I mean something like this:

ID Epi_start Epi_end Epi-Number Hospital

7 11-Dec-10 17-Dec-10 1 .

7 15-Oct-11 22-Oct-11 1 .

7 01-Dec-11 10-Dec-11 2 1

7 20-Jun-12 29-Jun-12 1 1

7 29-Jul-12 08-Aug-12 2 .

7 29-Jul-12 08-Aug-12 3 .

I might be wrong though. I have used SQL only once before (and it did not work that time). I am still not sure if I am getting your logic (both the count(*)>1 and max(hospital) sections). Thanks.

Primavera · Posted 02-06-2020 04:01 PM

Hi again,

May I take a bit of your time again?

I am trying to modify your code for another need. I want to gather all the records of an ID if that individual has had an Epi_start before a certain date (say 01Jan2011). I don't care about any of the conditions that I had previously. This time I only care about Epi_start date.

I have modified the last line to :

where ID in (select ID from have group by ID having count(*)>1 and Epi_start LT '01JAN2011'd );

But it does not work and giving me a table with 0 rows and this note in the log:

NOTE: The query requires remerging summary statistics back with the original data.

Can you please tell me where I have gone wrong?

Thank you,

novinosrin · Posted 02-06-2020 04:53 PM

HI @Primavera

Since the requirement is much too simple considering your notes

1. I want to gather all the records of an ID if that individual has had an Epi_start before a certain date (say 01Jan2011).

2. I don't care about any of the conditions that I had previously. This time I only care about Epi_start date.

The logic becomes merely a filter, and no need for a summary statistics. Therefore, essentially how you would build is

1. where Epi_start LT '01JAN2011'd - This will filter with ID's satisfying the condition

2. Now, get other corresponding records for the filtered ID's.

So in essence, the mind should now think 1 as a subquery( or in simple terms 1st filter query) and 2 An outer query or a full query to get all the corresponding records for those ID's. That's as simple or as terse the build logic would become.

Now, let's turn English words to SQL words

proc sql;
create table want as
select *
from have
where id in (select id from have where Epi_start LT '01JAN2011'd);
quit;

Hope the explanation gives that spark to help readers build the logic. Cheers! Have fun!

Primavera · Posted 02-10-2020 03:34 PM

Hi @novinosrin ,

Thank you very much for your help and I am very sorry for my belated message of gratitude. I was occupied with something else.

I tried something very very similar to your code when I was trying to modify your previous code and it did not work. Now I plugged in your simple code and voila! Worked like a charm. So frustrating when my own code does not work and a master comes in and solves it so easily.

Thank you again for coming to my help again and thank you for your perfect explanations.

Regards,

Primavera · Posted 05-01-2020 02:32 AM

Hi again,

I hope you are safe and well.

I am very sorry for taking your time once more. I have a problem that at first I thought I can code in less than 30 minutes but now 2 days later I have given up. It kind of follows the same trail as my previous questions. Let's say I have a data set with about 100 K participants. The records of one of these people could look like this:

ID	dispensation	Epi_start	Epi_end	Dispen-Number	Hospital	epi_duration
5	21-May-10	21-May-10	29-May-10	1	.	9
5	27-Aug-10	27-Aug-10	28-Aug-10	1	1	2
5	13-Sep-10	13-Sep-10	30-Sep-10	1	.	18
5	11-Oct-10	13-Sep-10	25-Oct-10	2	.	43
5	05-Nov-10	13-Sep-10	20-Dec-10	3	.	99
5	15-Dec-12	15-Dec-12	17-Dec-12	1	1	3
5	18-Oct-13	18-Oct-13	22-Oct-13	1	.	5
5	01-Nov-13	18-Oct-13	09-Nov-13	2	.	23
5	03-Mar-14	03-Mar-14	15-Mar-14	1	.	13

various people can have various records of course but it mainly looks like above. I want to gather all the records of drug dispensations in an episode if that episode has lasted more than 2 months. For example in the case of the subject with ID=5 that would be:

ID	dispensation	Epi_start	Epi_end	Dispen-Number	Hospital	epi_duration
5	13-Sep-10	13-Sep-10	30-Sep-10	1	.	18
5	11-Oct-10	13-Sep-10	25-Oct-10	2	.	43
5	05-Nov-10	13-Sep-10	20-Dec-10	3	.	99

Thank you once more,

novinosrin · Posted 05-01-2020 08:16 AM

Hi @Primavera We are okay. Thank you. I hope the same is the case where you are at. Okay, I vaguely recall you wanting Proc SQL based solutions? Is that correct or you have no preference?

Since our discussions have been proc sql thus far, I'd go with the same-


/*I want to gather all the records of drug dispensations in an episode if that episode has lasted more than 2 months.*/
data have;
input ID	(dispensation	Epi_start	Epi_end) (:date9.)	Dispen_Number	Hospital	epi_duration;
format dispensation Epi_start	Epi_end date9.;
cards;
5	21-May-10	21-May-10	29-May-10	1	.	9
5	27-Aug-10	27-Aug-10	28-Aug-10	1	1	2
5	13-Sep-10	13-Sep-10	30-Sep-10	1	.	18
5	11-Oct-10	13-Sep-10	25-Oct-10	2	.	43
5	05-Nov-10	13-Sep-10	20-Dec-10	3	.	99
5	15-Dec-12	15-Dec-12	17-Dec-12	1	1	3
5	18-Oct-13	18-Oct-13	22-Oct-13	1	.	5
5	01-Nov-13	18-Oct-13	09-Nov-13	2	.	23
5	03-Mar-14	03-Mar-14	15-Mar-14	1	.	13
;

proc sql;
create table want as
select *
from have
group by id,Epi_start
having intck('mon',Epi_start,max(Epi_end),'c')>2
order by id,dispensation,Epi_start,Epi_end;
quit;

Primavera · Posted 05-01-2020 03:49 PM

Hi,

Thank you very much once again. I tried your code and it works perfectly. I figured out most of the logic for the code. Just a few question:

1. Why did you group by both ID and Epi_start and not just by ID?

2. 'c' stands for 'continuous', right?

3. How would you modify this code to just grab the longest episode or the first qualifying episode of each participant (if some participants have more than 1 qualifying episodes)?

I actually don't prefer SQL as I am very bad at it. I am slowly reading and learning about it. Do you know how to code for this without using SQL? I tried several methods including reversing the data set and trying to flag the desired dispensations and none of them worked. I am very sorry for taking your time; it's just that your codes are very professional and educational for me.

Thanks again,

novinosrin · Posted 05-01-2020 05:23 PM

Need not be so formal. It's fun to be around here. I am about to go home. I might login from home and will offer an alternative tonight or tomorrow morning. It's been a long day.

How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

Re: How to find all the records of certain ID with a certain characteristics

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away