BookmarkSubscribeRSS Feed
Solvej
Obsidian | Level 7

Dear SAS Experts

 

I have what I think is a simple problem that I hope you can help me with.

 

I have the following data and would like to exclude line number 3 and keep number 2. Please notice that it is the date of diagnosis that are of interest and in both cases these are after admission but in line 3 it is after admission and within the time frame of the 2 admission.

 

 

Record_idadmission_numberAdmission dateDischarge datedate of diagnosisdiagnosis   
1101-01-201001-02-201002-01-2010df200   
1101-01-201001-02-201002-03-2010df100I want to keep this line
1101-01-201001-02-201004-03-2010df147I want to exclude this line
1203-03-201031-03-201004-03-2010df147   
1203-03-201031-03-201004-03-2010df200   
2        
2        
2        
2        

 

Kind regards

 

Solvej

12 REPLIES 12
Solvej
Obsidian | Level 7

Dear KurtBremser

 

I apologize for the data. I have very sensitive data so I have to make everything up for my questions.

 

I hope this datastep is useful. The date formats are not correct, but Im unable to find the correct ones. My you know what will work.

 

data have;

input Record_id admission_number Admission_date DATE10. Discharge_date date10. date_of_diagnosis date10. diagnosis $;

format Admission_date date10. Discharge_date date10. date_of_diagnosis date10.;

datalines;

 

1 1 01-01-2010 01-02-2010 02-01-2010 df200

1 1 01-01-2010 01-02-2010 02-03-2010 df100

1 1 01-01-2010 01-02-2010 04-03-2010 df147

1 2 03-03-2010 31-03-2010 04-03-2010 df147

1 2 03-03-2010 31-03-2010 04-03-2010 df200

; run;

 

Kind regards

 

Solvej

Kurt_Bremser
Super User

It takes some kind of SQL trickery:

data have;
input
  Record_id
  admission_number
  (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)
  diagnosis $
;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
;
run;

proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis
 not in (
   select date_of_diagnosis from have b
   where
   b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
   and
   b.Admission_date ne a.Admission_date
 )
;
quit;

proc print data=want noobs;
run;

Result:

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    02-01-2010      df200  
   1            1        01-01-2010    01-02-2010    02-03-2010      df100  
   1            2        03-03-2010    31-03-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df200  
Solvej
Obsidian | Level 7

Dear KurtBremser

 

Thank you for your answer.

 

When I run the same code in my real data i loose half of my data. I cannot see what is wrong. Do I somehow have to build in this has to be within record_id ?

 

Kind regards

 

Solvej

 

Kurt_Bremser
Super User

You can see that, with your original example data, my code eliminated exactly the row you wanted eliminated, and kept the 4 others.

For code tested with certain data, you need to supply that.

 

You may need to add

   and
   b.Record_id = a.Record_id

in the where condition of the subquery.

Solvej
Obsidian | Level 7

Dear KurtBremser

 

Thank you. I think It works now. However I would like to see the observations that I delete. How can I build this into the sql statement?

 

Kind regards

 

Solvej

Kurt_Bremser
Super User

You can restructure the whole statement so that a flag is set, or you can create a second table with the inverse condition:

proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis not in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
quit;
Solvej
Obsidian | Level 7

Hi Again

 

When I run the code I delete 14 observations with part one but end up with 0 observations in part 2. Have you changed more than what I have but in bold and with underscore?

 

proc sql;
create table want as select distinct a.* from
have a
where a.date_of_diagnosis not in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;

quit;

 

Kind regards Solvej

Kurt_Bremser
Super User

I can only work with the data you gave us:

data have;
input
  Record_id
  admission_number
  (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)
  diagnosis $
;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
;
run;

proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis not in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
  select date_of_diagnosis from have b
  where
  b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
  and
  b.Admission_date ne a.Admission_date
  and
  b.Record_id = a.Record_id
)
;
quit;

proc print data=have noobs;
run;

proc print data=want noobs;
run;

proc print data=deletes noobs;
run;

result:

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    02-01-2010      df200  
   1            1        01-01-2010    01-02-2010    02-03-2010      df100  
   1            1        01-01-2010    01-02-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df200  

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    02-01-2010      df200  
   1            1        01-01-2010    01-02-2010    02-03-2010      df100  
   1            2        03-03-2010    31-03-2010    04-03-2010      df147  
   1            2        03-03-2010    31-03-2010    04-03-2010      df200  

Record_    admission_    Admission_    Discharge_      date_of_
   id        number         date          date        diagnosis    diagnosis

   1            1        01-01-2010    01-02-2010    04-03-2010      df147  

You can see it works with your example data.

Solvej
Obsidian | Level 7

Hi Again

 

I can still not get the programme to work and the data structure is to complicated to show here. Could you maybe show me how to flag the information I need instead.

 

Kind Regards

 

Solvej

Kurt_Bremser
Super User

Coding is driven by the data. Without data, I'd just be wasting time in an information vacuum.


@Solvej wrote:

Hi Again

 

I can still not get the programme to work and the data structure is to complicated to show here. Could you maybe show me how to flag the information I need instead.

 

Kind Regards

 

Solvej


 

Solvej
Obsidian | Level 7

Hi Again

 

Okay the data is the same as above. However now I want to flag the data instead of making af new dataset that only contains relevant data. The reason for wanting this is that I don't know if or where there may be a problem and without this knowledge it is impossible for me to make a dataset with the relevant structure.

 

I hope this makes sence.

 

Thank you in advance for helping me out.

 

Kind regards

 

Solvej

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2173 views
  • 2 likes
  • 2 in conversation