Re: Restriction of data based on dates.

Solvej · Posted 04-17-2018 08:46 AM

Dear SAS Experts

I have what I think is a simple problem that I hope you can help me with.

I have the following data and would like to exclude line number 3 and keep number 2. Please notice that it is the date of diagnosis that are of interest and in both cases these are after admission but in line 3 it is after admission and within the time frame of the 2 admission.

Record_id	admission_number	Admission date	Discharge date	date of diagnosis	diagnosis
1	1	01-01-2010	01-02-2010	02-01-2010	df200
1	1	01-01-2010	01-02-2010	02-03-2010	df100	I want to keep this line
1	1	01-01-2010	01-02-2010	04-03-2010	df147	I want to exclude this line
1	2	03-03-2010	31-03-2010	04-03-2010	df147
1	2	03-03-2010	31-03-2010	04-03-2010	df200
2
2
2
2

Kind regards

Solvej

Kurt_Bremser · Posted 04-17-2018 08:57 AM

Please supply example data in a usable form, see my footnotes.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Solvej · Posted 04-18-2018 03:40 AM

Dear KurtBremser

I apologize for the data. I have very sensitive data so I have to make everything up for my questions.

I hope this datastep is useful. The date formats are not correct, but Im unable to find the correct ones. My you know what will work.

data have;

input Record_id admission_number Admission_date DATE10. Discharge_date date10. date_of_diagnosis date10. diagnosis $;

format Admission_date date10. Discharge_date date10. date_of_diagnosis date10.;

datalines;

1 1 01-01-2010 01-02-2010 02-01-2010 df200

1 1 01-01-2010 01-02-2010 02-03-2010 df100

1 1 01-01-2010 01-02-2010 04-03-2010 df147

1 2 03-03-2010 31-03-2010 04-03-2010 df147

1 2 03-03-2010 31-03-2010 04-03-2010 df200

; run;

Kind regards

Solvej

Kurt_Bremser · Posted 04-18-2018 04:44 AM

It takes some kind of SQL trickery:

data have;
input
  Record_id
  admission_number
  (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)
  diagnosis $
;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
;
run;

proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis
 not in (
   select date_of_diagnosis from have b
   where
   b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
   and
   b.Admission_date ne a.Admission_date
 )
;
quit;

proc print data=want noobs;
run;

Result:

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    02-01-2010      df200  
   1            1        01-01-2010    01-02-2010    02-03-2010      df100  
   1            2        03-03-2010    31-03-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df200

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Solvej · Posted 04-18-2018 07:51 AM

Dear KurtBremser

Thank you for your answer.

When I run the same code in my real data i loose half of my data. I cannot see what is wrong. Do I somehow have to build in this has to be within record_id ?

Kind regards

Solvej

Kurt_Bremser · Posted 04-18-2018 07:54 AM

You can see that, with your original example data, my code eliminated exactly the row you wanted eliminated, and kept the 4 others.

For code tested with certain data, you need to supply that.

You may need to add

   and
   b.Record_id = a.Record_id

in the where condition of the subquery.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Solvej · Posted 04-19-2018 03:29 AM

Dear KurtBremser

Thank you. I think It works now. However I would like to see the observations that I delete. How can I build this into the sql statement?

Kind regards

Solvej

Kurt_Bremser · Posted 04-19-2018 03:35 AM

You can restructure the whole statement so that a flag is set, or you can create a second table with the inverse condition:

proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis not in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Solvej · Posted 04-19-2018 04:25 AM

Hi Again

When I run the code I delete 14 observations with part one but end up with 0 observations in part 2. Have you changed more than what I have but in bold and with underscore?

proc sql;
create table want as select distinct a.* from
have a
where a.date_of_diagnosis not in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;

quit;

Kind regards Solvej

Kurt_Bremser · Posted 04-19-2018 04:32 AM

I can only work with the data you gave us:

data have;
input
  Record_id
  admission_number
  (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)
  diagnosis $
;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
;
run;

proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis not in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
quit;

proc print data=have noobs;
run;

proc print data=want noobs;
run;

proc print data=deletes noobs;
run;

result:

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    02-01-2010      df200  
   1            1        01-01-2010    01-02-2010    02-03-2010      df100  
   1            1        01-01-2010    01-02-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df200  

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    02-01-2010      df200  
   1            1        01-01-2010    01-02-2010    02-03-2010      df100  
   1            2        03-03-2010    31-03-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df200  

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    04-03-2010      df147

You can see it works with your example data.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Solvej · Posted 05-03-2018 07:28 AM

Hi Again

I can still not get the programme to work and the data structure is to complicated to show here. Could you maybe show me how to flag the information I need instead.

Kind Regards

Solvej

Kurt_Bremser · Posted 05-03-2018 07:51 AM

Coding is driven by the data. Without data, I'd just be wasting time in an information vacuum.

@Solvej wrote:

Hi Again

I can still not get the programme to work and the data structure is to complicated to show here. Could you maybe show me how to flag the information I need instead.

Kind Regards

Solvej

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Solvej · Posted 05-03-2018 08:04 AM

Hi Again

Okay the data is the same as above. However now I want to flag the data instead of making af new dataset that only contains relevant data. The reason for wanting this is that I don't know if or where there may be a problem and without this knowledge it is impossible for me to make a dataset with the relevant structure.

I hope this makes sence.

Thank you in advance for helping me out.

Kind regards

Solvej

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away