BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Primavera
Quartz | Level 8

Hi,

 

I am using SAS 9.4 TS level 1M3. I have a data set where each ID can have multiple records. It looks like this:

 

IDEpi_startEpi_endEpi-Number
121-Aug-1025-Aug-101
227-Aug-1010-Sep-101
313-Sep-1018-Sep-101
301-Oct-1011-Oct-102
314-Oct-1019-Oct-103
315-Dec-1218-Dec-101
418-Oct-1022-Oct-101
418-Oct-1022-Oct-102
519-Oct-1029-Oct-101
513-Dec-1001-Jan-112
513-Dec-1001-Jan-113
516-Feb-1128-Feb-114
601-Dec-1009-Dec-101
711-Dec-1017-Dec-101
715-Oct-1122-Oct-111
701-Dec-1110-Dec-112
720-Jun-1229-Jun-121
729-Jul-1208-Aug-122
707-Sep-1216-Sep-123
819-Dec-1025-Dec-101
819-Dec-1025-Dec-102
819-Dec-1025-Dec-103
817-Jan-1325-Jan-131
901-Mar-1110-Mar-111
925-Mar-1130-Mar-112

 

I want to transfer all the records of individuals who have even one identical Epi_start and Epi_end to a new file. Basically I want to get this file:

IDEpi_startEpi_endEpi-Number
418-Oct-1022-Oct-101
418-Oct-1022-Oct-102
519-Oct-1029-Oct-101
513-Dec-1001-Jan-112
513-Dec-1001-Jan-113
516-Feb-1128-Feb-114
819-Dec-1025-Dec-101
819-Dec-1025-Dec-102
819-Dec-1025-Dec-103
817-Jan-1325-Jan-131

 

I tried a number of ways myself but being new to SAS I did not get the correct results. Thanks for the help.

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Slight tweak all that you need is 

 and max(hospital));

Full version

 



data have;
input ID	(Epi_start	Epi_end) (:Date9.)	Epi_Number	Hospital;
cards;
1	21-Aug-10	25-Aug-10	1	.
2	27-Aug-10	10-Sep-10	1	1
3	13-Sep-10	18-Sep-10	1	.
3	01-Oct-10	11-Oct-10	2	.
3	14-Oct-10	19-Oct-10	3	.
3	15-Dec-12	18-Dec-10	1	1
4	18-Oct-10	22-Oct-10	1	.
4	18-Oct-10	22-Oct-10	2	.
5	19-Oct-10	29-Oct-10	1	.
5	13-Dec-10	01-Jan-11	2	.
5	13-Dec-10	01-Jan-11	3	1
5	16-Feb-11	28-Feb-11	4	.
6	01-Dec-10	09-Dec-10	1	.
7	11-Dec-10	17-Dec-10	1	.
7	15-Oct-11	22-Oct-11	1	.
7	01-Dec-11	10-Dec-11	2	1
7	20-Jun-12	29-Jun-12	1	1
7	29-Jul-12	08-Aug-12	2	.
7	07-Sep-12	16-Sep-12	3	.
8	19-Dec-10	25-Dec-10	1	.
8	19-Dec-10	25-Dec-10	2	1
8	19-Dec-10	25-Dec-10	3	.
8	17-Jan-13	25-Jan-13	1	.
9	01-Mar-11	10-Mar-11	1	1
9	25-Mar-11	30-Mar-11	2	1
;
 
proc sql;
create table want as
select *
from have
where id in (select id from have group by id,Epi_start,Epi_end having count(*)>1 and max(hospital));
quit;

SQL is so easy 🙂

View solution in original post

34 REPLIES 34
novinosrin
Tourmaline | Level 20

Easy in  SQL

 


data have;
input ID	(Epi_start	Epi_end) (:date9.)	Epi_Number;
format Epi_start	Epi_end date9.;
cards;
1	21-Aug-10	25-Aug-10	1
2	27-Aug-10	10-Sep-10	1
3	13-Sep-10	18-Sep-10	1
3	01-Oct-10	11-Oct-10	2
3	14-Oct-10	19-Oct-10	3
3	15-Dec-12	18-Dec-10	1
4	18-Oct-10	22-Oct-10	1
4	18-Oct-10	22-Oct-10	2
5	19-Oct-10	29-Oct-10	1
5	13-Dec-10	01-Jan-11	2
5	13-Dec-10	01-Jan-11	3
5	16-Feb-11	28-Feb-11	4
6	01-Dec-10	09-Dec-10	1
7	11-Dec-10	17-Dec-10	1
7	15-Oct-11	22-Oct-11	1
7	01-Dec-11	10-Dec-11	2
7	20-Jun-12	29-Jun-12	1
7	29-Jul-12	08-Aug-12	2
7	07-Sep-12	16-Sep-12	3
8	19-Dec-10	25-Dec-10	1
8	19-Dec-10	25-Dec-10	2
8	19-Dec-10	25-Dec-10	3
8	17-Jan-13	25-Jan-13	1
9	01-Mar-11	10-Mar-11	1
9	25-Mar-11	30-Mar-11	2
;

proc sql;
create table want as
select *
from have
where id in (select id from have group by id,Epi_start,Epi_end having count(*)>1);
quit;
Primavera
Quartz | Level 8

Sorry, I failed to mention that the data set that I am working with is not identical to the one I posted. My original data set is a SAS data set with about 300000 records and about 20 variables. I am not sure if I can create the have table like your answer. 

Also I am particularly bad at SQL sadly. Can you please explain the last line of your code (particularly count(*)>1 part of it)?

 

Many thanks,

 

novinosrin
Tourmaline | Level 20

The logic is very simple:

 

1. Subset ID's , whose records  contains more than one occurrence of identical Epi_start Epi_end which has to make a count that is greater than one.

 

2. LooK up ID's from the full table to the ID's the resulting subset mentioned in 1

 

The output will have only those matching ID's . Does that explain enough to help you comprehend?

 

It doesn't matter how many records or variables you have as long as the variables of interest for the subsetting logic is ID,   Epi_start Epi_end 

Primavera
Quartz | Level 8

Thank you so much. It worked and I am beginning to understand it. May I ask one more question? Now assume that you have a hospital flag that is either 1 or missing. Like this:

IDEpi_startEpi_endEpi-NumberHospital
121-Aug-1025-Aug-101.
227-Aug-1010-Sep-1011
313-Sep-1018-Sep-101.
301-Oct-1011-Oct-102.
314-Oct-1019-Oct-103.
315-Dec-1218-Dec-1011
418-Oct-1022-Oct-101.
418-Oct-1022-Oct-102.
519-Oct-1029-Oct-101.
513-Dec-1001-Jan-112.
513-Dec-1001-Jan-1131
516-Feb-1128-Feb-114.
601-Dec-1009-Dec-101.
711-Dec-1017-Dec-101.
715-Oct-1122-Oct-111.
701-Dec-1110-Dec-1121
720-Jun-1229-Jun-1211
729-Jul-1208-Aug-122.
707-Sep-1216-Sep-123.
819-Dec-1025-Dec-101.
819-Dec-1025-Dec-1021
819-Dec-1025-Dec-103.
817-Jan-1325-Jan-131.
901-Mar-1110-Mar-1111
925-Mar-1130-Mar-1121

 

Now lets say I want the records where at least one of the identical epi_start and epi_end has a hospital flag of 1. Like table below:

 

IDEpi_startEpi_endEpi-NumberHospital
519-Oct-1029-Oct-101.
513-Dec-1001-Jan-112.
513-Dec-1001-Jan-1131
516-Feb-1128-Feb-114.
819-Dec-1025-Dec-101.
819-Dec-1025-Dec-1021
819-Dec-1025-Dec-103.
817-Jan-1325-Jan-131.

 

How would you add that to your code? Thank you so much again.

novinosrin
Tourmaline | Level 20

Slight tweak all that you need is 

 and max(hospital));

Full version

 



data have;
input ID	(Epi_start	Epi_end) (:Date9.)	Epi_Number	Hospital;
cards;
1	21-Aug-10	25-Aug-10	1	.
2	27-Aug-10	10-Sep-10	1	1
3	13-Sep-10	18-Sep-10	1	.
3	01-Oct-10	11-Oct-10	2	.
3	14-Oct-10	19-Oct-10	3	.
3	15-Dec-12	18-Dec-10	1	1
4	18-Oct-10	22-Oct-10	1	.
4	18-Oct-10	22-Oct-10	2	.
5	19-Oct-10	29-Oct-10	1	.
5	13-Dec-10	01-Jan-11	2	.
5	13-Dec-10	01-Jan-11	3	1
5	16-Feb-11	28-Feb-11	4	.
6	01-Dec-10	09-Dec-10	1	.
7	11-Dec-10	17-Dec-10	1	.
7	15-Oct-11	22-Oct-11	1	.
7	01-Dec-11	10-Dec-11	2	1
7	20-Jun-12	29-Jun-12	1	1
7	29-Jul-12	08-Aug-12	2	.
7	07-Sep-12	16-Sep-12	3	.
8	19-Dec-10	25-Dec-10	1	.
8	19-Dec-10	25-Dec-10	2	1
8	19-Dec-10	25-Dec-10	3	.
8	17-Jan-13	25-Jan-13	1	.
9	01-Mar-11	10-Mar-11	1	1
9	25-Mar-11	30-Mar-11	2	1
;
 
proc sql;
create table want as
select *
from have
where id in (select id from have group by id,Epi_start,Epi_end having count(*)>1 and max(hospital));
quit;

SQL is so easy 🙂

Primavera
Quartz | Level 8

Thank you for replying again. I think there might be a problem with this additional code. If we have identical Epi_start and Epi_end that are not hospitalizations in one individual's records but then we also have one separate occasion of hospitalization (hospital=1) then wouldn't that set of records make it to our final list? I mean something like this:

IDEpi_startEpi_endEpi-NumberHospital
711-Dec-1017-Dec-101.
715-Oct-1122-Oct-111.
701-Dec-1110-Dec-1121
720-Jun-1229-Jun-1211
729-Jul-1208-Aug-122.
729-Jul-1208-Aug-123.

 

I might be wrong though. I have used SQL only once before (and it did not work that time). I am still not sure if I am getting your logic (both the count(*)>1 and max(hospital) sections). Thanks.

 

novinosrin
Tourmaline | Level 20

Hi @Primavera  Yes, the current  logic/logic so far will not pick ID7. The current logic approach is very simple:

 

1. Identify the number of identical occurrences of  Epi_start and Epi_end for a given ID. If it is greater than one, it passes the 1st test

2. Use the result of 1 and determine if any of the records in the subset has hospitalization flag. 

 

If the check passes both the tests, output all the records of those IDs. For ID7 in your latest post, though it passes the 1st test, it fails the 2nd, hence 7 gets dropped .

 

Please review your requirement thoroughly and let us know what you want in/as final. It's cool to go back and forth as long we hit the much needed target. 

 

 


@Primavera wrote:

Thank you for replying again. I think there might be a problem with this additional code. If we have identical Epi_start and Epi_end that are not hospitalizations in one individual's records but then we also have one separate occasion of hospitalization (hospital=1) then wouldn't that set of records make it to our final list? I mean something like this:

ID Epi_start Epi_end Epi-Number Hospital
7 11-Dec-10 17-Dec-10 1 .
7 15-Oct-11 22-Oct-11 1 .
7 01-Dec-11 10-Dec-11 2 1
7 20-Jun-12 29-Jun-12 1 1
7 29-Jul-12 08-Aug-12 2 .
7 29-Jul-12 08-Aug-12 3 .

 

I might be wrong though. I have used SQL only once before (and it did not work that time). I am still not sure if I am getting your logic (both the count(*)>1 and max(hospital) sections). Thanks.

 


 

Primavera
Quartz | Level 8

Hi again,

 

May I take a bit of your time again?

 

I am trying to modify your code for another need. I want to gather all the records of an ID if that individual has had an Epi_start before a certain date (say 01Jan2011). I don't care about any of the conditions that I had previously. This time I only care about Epi_start date.

I have modified the last line to :

where ID in (select ID from have group by ID having count(*)>1 and Epi_start LT '01JAN2011'd );

 

But it does not work and giving me a table with 0 rows and this note in the log:

NOTE: The query requires remerging summary statistics back with the original data.

 

Can you please tell me where I have gone wrong?

 

Thank you,
 

novinosrin
Tourmaline | Level 20

HI @Primavera 

 

Since the requirement is much too simple considering your notes

 

1. I want to gather all the records of an ID if that individual has had an Epi_start before a certain date (say 01Jan2011).

2. I don't care about any of the conditions that I had previously. This time I only care about Epi_start date.

 

The logic becomes merely a filter, and no need for a summary statistics. Therefore, essentially how you would build is 

 

1. where Epi_start LT '01JAN2011'd   -  This will filter with ID's satisfying the condition

2. Now, get other corresponding records for the filtered ID's.

 

So in essence, the mind should now think 1 as a subquery( or in simple terms 1st filter query) and 2 An outer query or a full query to get all the corresponding records for those ID's. That's as simple or as terse the build logic would become.

 

Now, let's turn English words to SQL words

 

proc sql;
create table want as
select *
from have
where id in (select id from have where Epi_start LT '01JAN2011'd);
quit;

Hope the explanation gives that spark to help readers build the logic. Cheers!  Have fun!

 

Primavera
Quartz | Level 8

Hi @novinosrin ,

 

Thank you very much for your help and I am very sorry for my belated message of gratitude. I was occupied with something else.

 

I tried something very very similar to your code when I was trying to modify your previous code and it did not work. Now I plugged in your simple code and voila! Worked like a charm. So frustrating when my own code does not work and a master comes in and solves it so easily.

 

Thank you again for coming to my help again and thank you for your perfect explanations. 

 

Regards, 

Primavera
Quartz | Level 8

Hi again,

I hope you are safe and well.

I am very sorry for taking your time once more. I have a problem that at first I thought I can code in less than 30 minutes but now 2 days later I have given up. It kind of follows the same trail as my previous questions. Let's say I have a data set with about 100 K participants. The records of one of these people could look like this:

IDdispensationEpi_startEpi_endDispen-NumberHospitalepi_duration
521-May-1021-May-1029-May-101.9
527-Aug-1027-Aug-1028-Aug-10112
513-Sep-1013-Sep-1030-Sep-101.18
511-Oct-1013-Sep-1025-Oct-102.43
505-Nov-1013-Sep-1020-Dec-103.99
515-Dec-1215-Dec-1217-Dec-12113
518-Oct-1318-Oct-1322-Oct-131.5
501-Nov-1318-Oct-1309-Nov-132.23
503-Mar-1403-Mar-1415-Mar-141.13

  

various people can have various records of course but it mainly looks like above. I want to gather all the records of drug dispensations in an episode if that episode has lasted more than 2 months.  For example in the case of the subject with ID=5 that would be:

IDdispensationEpi_startEpi_endDispen-NumberHospitalepi_duration
513-Sep-1013-Sep-1030-Sep-101.18
511-Oct-1013-Sep-1025-Oct-102.43
505-Nov-1013-Sep-1020-Dec-103.99

     

Thank you once more,

novinosrin
Tourmaline | Level 20

Hi @Primavera  We are okay. Thank you. I hope the same is the case where you are at. Okay, I vaguely recall you wanting Proc SQL based solutions? Is that correct or you have no preference?

 

Since our discussions have been proc sql thus far, I'd go with the same-


/*I want to gather all the records of drug dispensations in an episode if that episode has lasted more than 2 months.*/
data have;
input ID	(dispensation	Epi_start	Epi_end) (:date9.)	Dispen_Number	Hospital	epi_duration;
format dispensation Epi_start	Epi_end date9.;
cards;
5	21-May-10	21-May-10	29-May-10	1	.	9
5	27-Aug-10	27-Aug-10	28-Aug-10	1	1	2
5	13-Sep-10	13-Sep-10	30-Sep-10	1	.	18
5	11-Oct-10	13-Sep-10	25-Oct-10	2	.	43
5	05-Nov-10	13-Sep-10	20-Dec-10	3	.	99
5	15-Dec-12	15-Dec-12	17-Dec-12	1	1	3
5	18-Oct-13	18-Oct-13	22-Oct-13	1	.	5
5	01-Nov-13	18-Oct-13	09-Nov-13	2	.	23
5	03-Mar-14	03-Mar-14	15-Mar-14	1	.	13
;

proc sql;
create table want as
select *
from have
group by id,Epi_start
having intck('mon',Epi_start,max(Epi_end),'c')>2
order by id,dispensation,Epi_start,Epi_end;
quit;
Primavera
Quartz | Level 8

Hi,

Thank you very much once again. I tried your code and it works perfectly. I figured out most of the logic for the code. Just a few question:

1. Why did you group by both ID and Epi_start and not just by ID?

2.  'c' stands for 'continuous', right?

3. How would you modify this code to just grab the longest episode or the first qualifying episode of each participant (if some participants have more than 1 qualifying episodes)? 

 

I actually don't prefer SQL as I am very bad at it. I am slowly reading and learning about it. Do you know how to code for this without using SQL? I tried several methods including reversing the data set and trying to flag the desired dispensations and none of them worked. I am very sorry for taking your time; it's just that your codes are very professional and educational for me.

 

Thanks again,

 

novinosrin
Tourmaline | Level 20

Need not be so formal. It's fun to be around here. I am about to go home. I might login from home and will offer an alternative tonight or tomorrow morning. It's been a long day.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 34 replies
  • 5267 views
  • 19 likes
  • 4 in conversation