Dear SAS Experts
I have what I think is a simple problem that I hope you can help me with.
I have the following data and would like to exclude line number 3 and keep number 2. Please notice that it is the date of diagnosis that are of interest and in both cases these are after admission but in line 3 it is after admission and within the time frame of the 2 admission.
Record_id | admission_number | Admission date | Discharge date | date of diagnosis | diagnosis | |||
1 | 1 | 01-01-2010 | 01-02-2010 | 02-01-2010 | df200 | |||
1 | 1 | 01-01-2010 | 01-02-2010 | 02-03-2010 | df100 | I want to keep this line | ||
1 | 1 | 01-01-2010 | 01-02-2010 | 04-03-2010 | df147 | I want to exclude this line | ||
1 | 2 | 03-03-2010 | 31-03-2010 | 04-03-2010 | df147 | |||
1 | 2 | 03-03-2010 | 31-03-2010 | 04-03-2010 | df200 | |||
2 | ||||||||
2 | ||||||||
2 | ||||||||
2 |
Kind regards
Solvej
Please supply example data in a usable form, see my footnotes.
Dear KurtBremser
I apologize for the data. I have very sensitive data so I have to make everything up for my questions.
I hope this datastep is useful. The date formats are not correct, but Im unable to find the correct ones. My you know what will work.
data have;
input Record_id admission_number Admission_date DATE10. Discharge_date date10. date_of_diagnosis date10. diagnosis $;
format Admission_date date10. Discharge_date date10. date_of_diagnosis date10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
; run;
Kind regards
Solvej
It takes some kind of SQL trickery:
data have;
input
Record_id
admission_number
(Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)
diagnosis $
;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
;
run;
proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis
not in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
)
;
quit;
proc print data=want noobs;
run;
Result:
Record_ admission_ Admission_ Discharge_ date_of_ id number date date diagnosis diagnosis 1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 1 01-01-2010 01-02-2010 02-03-2010 df100 1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 2 03-03-2010 31-03-2010 04-03-2010 df200
Dear KurtBremser
Thank you for your answer.
When I run the same code in my real data i loose half of my data. I cannot see what is wrong. Do I somehow have to build in this has to be within record_id ?
Kind regards
Solvej
You can see that, with your original example data, my code eliminated exactly the row you wanted eliminated, and kept the 4 others.
For code tested with certain data, you need to supply that.
You may need to add
and
b.Record_id = a.Record_id
in the where condition of the subquery.
Dear KurtBremser
Thank you. I think It works now. However I would like to see the observations that I delete. How can I build this into the sql statement?
Kind regards
Solvej
You can restructure the whole statement so that a flag is set, or you can create a second table with the inverse condition:
proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis not in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
quit;
Hi Again
When I run the code I delete 14 observations with part one but end up with 0 observations in part 2. Have you changed more than what I have but in bold and with underscore?
proc sql;
create table want as select distinct a.* from
have a
where a.date_of_diagnosis not in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
quit;
Kind regards Solvej
I can only work with the data you gave us:
data have;
input
Record_id
admission_number
(Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)
diagnosis $
;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10.;
datalines;
1 1 01-01-2010 01-02-2010 02-01-2010 df200
1 1 01-01-2010 01-02-2010 02-03-2010 df100
1 1 01-01-2010 01-02-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df147
1 2 03-03-2010 31-03-2010 04-03-2010 df200
;
run;
proc sql;
create table want as
select distinct a.* from
have a
where a.date_of_diagnosis not in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
create table deletes as
select distinct a.* from
have a
where a.date_of_diagnosis in (
select date_of_diagnosis from have b
where
b.Admission_date <= a.date_of_diagnosis <= b.Discharge_date
and
b.Admission_date ne a.Admission_date
and
b.Record_id = a.Record_id
)
;
quit;
proc print data=have noobs;
run;
proc print data=want noobs;
run;
proc print data=deletes noobs;
run;
result:
Record_ admission_ Admission_ Discharge_ date_of_ id number date date diagnosis diagnosis 1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 1 01-01-2010 01-02-2010 02-03-2010 df100 1 1 01-01-2010 01-02-2010 04-03-2010 df147 1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 2 03-03-2010 31-03-2010 04-03-2010 df200 Record_ admission_ Admission_ Discharge_ date_of_ id number date date diagnosis diagnosis 1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 1 01-01-2010 01-02-2010 02-03-2010 df100 1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 2 03-03-2010 31-03-2010 04-03-2010 df200 Record_ admission_ Admission_ Discharge_ date_of_ id number date date diagnosis diagnosis 1 1 01-01-2010 01-02-2010 04-03-2010 df147
You can see it works with your example data.
Hi Again
I can still not get the programme to work and the data structure is to complicated to show here. Could you maybe show me how to flag the information I need instead.
Kind Regards
Solvej
Coding is driven by the data. Without data, I'd just be wasting time in an information vacuum.
@Solvej wrote:
Hi Again
I can still not get the programme to work and the data structure is to complicated to show here. Could you maybe show me how to flag the information I need instead.
Kind Regards
Solvej
Hi Again
Okay the data is the same as above. However now I want to flag the data instead of making af new dataset that only contains relevant data. The reason for wanting this is that I don't know if or where there may be a problem and without this knowledge it is impossible for me to make a dataset with the relevant structure.
I hope this makes sence.
Thank you in advance for helping me out.
Kind regards
Solvej
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.