Dear SAS Experts
I have what I think is a simple problem that I hope you can help me with.
I have the following data and would like to exclude line number 3 and keep number 2. Please notice that it is the date of diagnosis that are of interest and in both cases these are after admission but in line 3 it is after admission and within the time frame of the 2 admission.
Record_id | admission_number | Admission date | Discharge date | date of diagnosis | diagnosis | |||
1 | 1 | 01-01-2010 | 01-02-2010 | 02-01-2010 | df200 | |||
1 | 1 | 01-01-2010 | 01-02-2010 | 02-03-2010 | df100 | I want to keep this line | ||
1 | 1 | 01-01-2010 | 01-02-2010 | 04-03-2010 | df147 | I want to exclude this line | ||
1 | 2 | 03-03-2010 | 31-03-2010 | 04-03-2010 | df147 | |||
1 | 2 | 03-03-2010 | 31-03-2010 | 04-03-2010 | df200 | |||
2 | ||||||||
2 | ||||||||
2 | ||||||||
2 |
Kind regards
Solvej
Please supply example data in a usable form, see my footnotes.
Dear KurtBremser
I apologize for the data. I have very sensitive data so I have to make everything up for my questions.
I hope this datastep is useful. The date formats are not correct, but Im unable to find the correct ones. My you know what will work.
data have;
input Record_id admission_number Admission_date DATE10. Discharge_date date10. date_of_diagnosis date10. diagnosis $;
format Admission_date date10. Discharge_date date10. date_of_diagnosis date10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
; run;
Kind regards
Solvej
It takes some kind of SQL trickery:
data have;
input
Record_id
admission_number
(Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)
diagnosis $
;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
;
run;
proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis
not in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
)
;
quit;
proc print data=want noobs;
run;
Result:
Record_ admission_ Admission_ Discharge_ date_of_ id number date date diagnosis diagnosis 1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 1 01-01-2010 01-02-2010 02-03-2010 df100 1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 2 03-03-2010 31-03-2010 04-03-2010 df200
Dear KurtBremser
Thank you for your answer.
When I run the same code in my real data i loose half of my data. I cannot see what is wrong. Do I somehow have to build in this has to be within record_id ?
Kind regards
Solvej
You can see that, with your original example data, my code eliminated exactly the row you wanted eliminated, and kept the 4 others.
For code tested with certain data, you need to supply that.
You may need to add
and
b.Record_id = a.Record_id
in the where condition of the subquery.
Dear KurtBremser
Thank you. I think It works now. However I would like to see the observations that I delete. How can I build this into the sql statement?
Kind regards
Solvej
You can restructure the whole statement so that a flag is set, or you can create a second table with the inverse condition:
proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis not in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
quit;
Hi Again
When I run the code I delete 14 observations with part one but end up with 0 observations in part 2. Have you changed more than what I have but in bold and with underscore?
proc sql;
create table want as select distinct a.* from
have a
where a.date_of_diagnosis not in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
quit;
Kind regards Solvej
I can only work with the data you gave us:
data have;
input
Record_id
admission_number
(Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)
diagnosis $
;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
;
run;
proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis not in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
quit;
proc print data=have noobs;
run;
proc print data=want noobs;
run;
proc print data=deletes noobs;
run;
result:
Record_ admission_ Admission_ Discharge_ date_of_ id number date date diagnosis diagnosis 1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 1 01-01-2010 01-02-2010 02-03-2010 df100 1 1 01-01-2010 01-02-2010 04-03-2010 df147 1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 2 03-03-2010 31-03-2010 04-03-2010 df200 Record_ admission_ Admission_ Discharge_ date_of_ id number date date diagnosis diagnosis 1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 1 01-01-2010 01-02-2010 02-03-2010 df100 1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 2 03-03-2010 31-03-2010 04-03-2010 df200 Record_ admission_ Admission_ Discharge_ date_of_ id number date date diagnosis diagnosis 1 1 01-01-2010 01-02-2010 04-03-2010 df147
You can see it works with your example data.
Hi Again
I can still not get the programme to work and the data structure is to complicated to show here. Could you maybe show me how to flag the information I need instead.
Kind Regards
Solvej
Coding is driven by the data. Without data, I'd just be wasting time in an information vacuum.
@Solvej wrote:
Hi Again
I can still not get the programme to work and the data structure is to complicated to show here. Could you maybe show me how to flag the information I need instead.
Kind Regards
Solvej
Hi Again
Okay the data is the same as above. However now I want to flag the data instead of making af new dataset that only contains relevant data. The reason for wanting this is that I don't know if or where there may be a problem and without this knowledge it is impossible for me to make a dataset with the relevant structure.
I hope this makes sence.
Thank you in advance for helping me out.
Kind regards
Solvej
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.