Hello,
I have a huge dataset and trying to get summary like below in Proc SQL.
Example:
There are 10 patients having treatment no for each month. Each patient repeats every month (or may not) with different treatment no, but some get repeated with same treatment no.
I am trying to filter out only the patients that have the same treatment no repeated in all given months.
Data Given:
Month patient TreatmentNo
Jan pat1 20
Jan pat2 30
Jan pat2 40
Jan pat3 30
Jan pat6 90
Jan pat10 28
Feb pat1 30
Feb Pat2 40
Feb pat3 50
Mar pat1 10
Mar pat6 90
Mar pat10 28
Data Want:
Prior Month TreatmentNo
Pat2 Jan 40
Pat2 Feb 40
Pat 6 Jan 90
Pat 6 Mar 90
Pat 10 Jan 28
Pat 10 Mar 28
Thanks for checking
Assuming I understood what you are talking about.
data have;
input Month $ patient $ TreatmentNo;
datalines;
Jan pat1 20
Jan pat2 30
Jan pat2 40
Jan pat3 30
Jan pat6 90
Jan pat10 28
Feb pat1 30
Feb pat2 40
Feb pat3 50
Mar pat1 10
Mar pat6 90
Mar pat10 28
;
proc sql;
create table want as
select *
from have as a
group by patient,TreatmentNo
having count(distinct month)=
(select count(distinct month) from have where patient=a.patient)
;
quit;
Here is a hash object approach
data have;
input Month $ patient $ TreatmentNo;
datalines;
Jan pat1 20
Jan pat2 30
Jan pat2 40
Jan pat3 30
Jan pat6 90
Jan pat10 28
Feb pat1 30
Feb pat2 40
Feb pat3 50
Mar pat1 10
Mar pat6 90
Mar pat10 28
;
data want;
if _N_=1 then do;
declare hash h(dataset:'have', multidata:'Y');
h.definekey('Patient', 'TreatmentNo');
h.definedone();
end;
set have;
do _N_=1 by 1 while (h.do_over()=0);
if _N_ > 1 then output;
end;
run;
Hi @Kalai2008
You can try the following code:
proc sql;
select patient, TreatmentNo, Month
from have
group by patient, TreatmentNo
having count(TreatmentNo) >= 2
order by patient, Month;
quit;
- if you want to display months in a chronological order, you need to define and then use a format
- beware of the way patients are identified: eg. patient 2 is identified as Pat2 but also pat2 -> you need to harmonize the value
- I am wondering about whether Pat 10 and Pat 6 need to be retrieved or not in the report -> they have the same Treatment (-> repetition) in all given months. Only patient 2 has twice 40 and once 30, so corresponds to your description.
Assuming I understood what you are talking about.
data have;
input Month $ patient $ TreatmentNo;
datalines;
Jan pat1 20
Jan pat2 30
Jan pat2 40
Jan pat3 30
Jan pat6 90
Jan pat10 28
Feb pat1 30
Feb pat2 40
Feb pat3 50
Mar pat1 10
Mar pat6 90
Mar pat10 28
;
proc sql;
create table want as
select *
from have as a
group by patient,TreatmentNo
having count(distinct month)=
(select count(distinct month) from have where patient=a.patient)
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.