DATA Step, Macro, Functions and more

Restriction of data based on dates.

Reply
Contributor
Posts: 29

Restriction of data based on dates.

Dear SAS Experts

 

I have what I think is a simple problem that I hope you can help me with.

 

I have the following data and would like to exclude line number 3 and keep number 2. Please notice that it is the date of diagnosis that are of interest and in both cases these are after admission but in line 3 it is after admission and within the time frame of the 2 admission.

 

 

Record_idadmission_numberAdmission dateDischarge datedate of diagnosisdiagnosis   
1101-01-201001-02-201002-01-2010df200   
1101-01-201001-02-201002-03-2010df100I want to keep this line
1101-01-201001-02-201004-03-2010df147I want to exclude this line
1203-03-201031-03-201004-03-2010df147   
1203-03-201031-03-201004-03-2010df200   
2        
2        
2        
2        

 

Kind regards

 

Solvej

Super User
Posts: 10,570

Re: Restriction of data based on dates.

Please supply example data in a usable form, see my footnotes.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 29

Re: Restriction of data based on dates.

Posted in reply to KurtBremser

Dear KurtBremser

 

I apologize for the data. I have very sensitive data so I have to make everything up for my questions.

 

I hope this datastep is useful. The date formats are not correct, but Im unable to find the correct ones. My you know what will work.

 

data have;

input Record_id admission_number Admission_date DATE10. Discharge_date date10. date_of_diagnosis date10. diagnosis $;

format Admission_date date10. Discharge_date date10. date_of_diagnosis date10.;

datalines;

 

1 1 01-01-2010 01-02-2010 02-01-2010 df200

1 1 01-01-2010 01-02-2010 02-03-2010 df100

1 1 01-01-2010 01-02-2010 04-03-2010 df147

1 2 03-03-2010 31-03-2010 04-03-2010 df147

1 2 03-03-2010 31-03-2010 04-03-2010 df200

; run;

 

Kind regards

 

Solvej

Super User
Posts: 10,570

Re: Restriction of data based on dates.

It takes some kind of SQL trickery:

data have;
input
  Record_id
  admission_number
  (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)
  diagnosis $
;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
;
run;

proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis
 not in (
   select date_of_diagnosis from have b
   where
   b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
   and
   b.Admission_date ne a.Admission_date
 )
;
quit;

proc print data=want noobs;
run;

Result:

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    02-01-2010      df200  
   1            1        01-01-2010    01-02-2010    02-03-2010      df100  
   1            2        03-03-2010    31-03-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df200  
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 29

Re: Restriction of data based on dates.

Posted in reply to KurtBremser

Dear KurtBremser

 

Thank you for your answer.

 

When I run the same code in my real data i loose half of my data. I cannot see what is wrong. Do I somehow have to build in this has to be within record_id ?

 

Kind regards

 

Solvej

 

Super User
Posts: 10,570

Re: Restriction of data based on dates.

[ Edited ]

You can see that, with your original example data, my code eliminated exactly the row you wanted eliminated, and kept the 4 others.

For code tested with certain data, you need to supply that.

 

You may need to add

   and
   b.Record_id = a.Record_id

in the where condition of the subquery.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 29

Re: Restriction of data based on dates.

Posted in reply to KurtBremser

Dear KurtBremser

 

Thank you. I think It works now. However I would like to see the observations that I delete. How can I build this into the sql statement?

 

Kind regards

 

Solvej

Super User
Posts: 10,570

Re: Restriction of data based on dates.

You can restructure the whole statement so that a flag is set, or you can create a second table with the inverse condition:

proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis not in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 29

Re: Restriction of data based on dates.

Posted in reply to KurtBremser

Hi Again

 

When I run the code I delete 14 observations with part one but end up with 0 observations in part 2. Have you changed more than what I have but in bold and with underscore?

 

proc sql;
create table want as select distinct a.* from
have a
where a.date_of_diagnosis not in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;

quit;

 

Kind regards Solvej

Super User
Posts: 10,570

Re: Restriction of data based on dates.

I can only work with the data you gave us:

data have;
input
  Record_id
  admission_number
  (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)
  diagnosis $
;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
;
run;

proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis not in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
quit;

proc print data=have noobs;
run;

proc print data=want noobs;
run;

proc print data=deletes noobs;
run;

result:

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    02-01-2010      df200  
   1            1        01-01-2010    01-02-2010    02-03-2010      df100  
   1            1        01-01-2010    01-02-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df200  

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    02-01-2010      df200  
   1            1        01-01-2010    01-02-2010    02-03-2010      df100  
   1            2        03-03-2010    31-03-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df200  

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    04-03-2010      df147  

You can see it works with your example data.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 29

Re: Restriction of data based on dates.

Posted in reply to KurtBremser

Hi Again

 

I can still not get the programme to work and the data structure is to complicated to show here. Could you maybe show me how to flag the information I need instead.

 

Kind Regards

 

Solvej

Super User
Posts: 10,570

Re: Restriction of data based on dates.

Coding is driven by the data. Without data, I'd just be wasting time in an information vacuum.


@Solvej wrote:

Hi Again

 

I can still not get the programme to work and the data structure is to complicated to show here. Could you maybe show me how to flag the information I need instead.

 

Kind Regards

 

Solvej


 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 29

Re: Restriction of data based on dates.

Posted in reply to KurtBremser

Hi Again

 

Okay the data is the same as above. However now I want to flag the data instead of making af new dataset that only contains relevant data. The reason for wanting this is that I don't know if or where there may be a problem and without this knowledge it is impossible for me to make a dataset with the relevant structure.

 

I hope this makes sence.

 

Thank you in advance for helping me out.

 

Kind regards

 

Solvej

 

Ask a Question
Discussion stats
  • 12 replies
  • 158 views
  • 2 likes
  • 2 in conversation