Data Treatment ;
input subid $ Date : date9. Dose $;
Format Date : date9. ;
cards;
P101 13Jan2005 Asp-05mg
P102 14Jan2005 Asp-10mg
P101 19Jan2005 Asp-10mg
P103 19Jan2005 Asp-15mg
P102 20Jan2005 Asp-05mg
P101 25Jan2005 Asp-15mg
P103 25Jan2005 Asp-10mg
P102 26Jan2005 Asp-15mg
P104 26Jan2005 Asp-10mg
P101 31Jan2005 Asp-05mg
P103 30Jan2005 Asp-10mg
P102 01Feb2005 Asp-15mg
P104 02Feb2005 Asp-10mg
P105 02Feb2005 Asp-20mg
Run;
how to extract records except first and last records in each group by using sql?
Why do you want to limit yourself to PROC SQL when doing this in a data step is much simpler. SQL deals with set operations. The data step deals with individual rows. If you absolutely want to do this in SQL, create a variable n to keep track of first/last occurrences within by groups and use SQL like this
Data Treatment ;
input subid $ Date : date9. Dose $;
Format Date : date9. ;
cards;
P101 13Jan2005 Asp-05mg
P102 14Jan2005 Asp-10mg
P101 19Jan2005 Asp-10mg
P103 19Jan2005 Asp-15mg
P102 20Jan2005 Asp-05mg
P101 25Jan2005 Asp-15mg
P103 25Jan2005 Asp-10mg
P102 26Jan2005 Asp-15mg
P104 26Jan2005 Asp-10mg
P101 31Jan2005 Asp-05mg
P103 30Jan2005 Asp-10mg
P102 01Feb2005 Asp-15mg
P104 02Feb2005 Asp-10mg
P105 02Feb2005 Asp-20mg
Run;
data temp;
set Treatment;
n=_N_;
run;
proc sql;
create table want as
select * from temp
group by subid
having n ne max(n) & n ne min(n);
quit;
Is your data sorted by Date? Ie does the earliest/latest date imply the first/last record within each By Group? In that case, do
proc sql;
create table want as
select * from Treatment
group by subid
having Date ne max(Date) and Date ne min(Date);
quit;
1. You didn't state what "group" means. Is it SUBID?
2. If it is, you didn't present your desired output. Do you want to (a) keep the groups with fewer than 3 rows or (b) drop them? Because if a group contains 1 or 2 rows, your criterion of keeping only the rows in except the first and the last would mean that you'd keep no rows from that group at all.
3. Are your dates within the group (c) unique as in your data sample or (d) they can be duplicate?
If the answers are (a) and (c), then you can code:
data treatment ;
input subid $ date:date. dose $ ;
format date date9. ;
cards ;
P101 13Jan2005 Asp-05mg
P102 14Jan2005 Asp-10mg
P101 19Jan2005 Asp-10mg
P103 19Jan2005 Asp-15mg
P102 20Jan2005 Asp-05mg
P101 25Jan2005 Asp-15mg
P103 25Jan2005 Asp-10mg
P102 26Jan2005 Asp-15mg
P104 26Jan2005 Asp-10mg
P101 31Jan2005 Asp-05mg
P103 30Jan2005 Asp-10mg
P102 01Feb2005 Asp-15mg
P104 02Feb2005 Asp-10mg
P105 02Feb2005 Asp-20mg
run ;
proc sql ;
create table want as
select *
from treatment
group subid
having (date < max(date) and date > min(date)) or count(*) < 3
;
quit ;
If the answers are (a) and (d), you can't rely on the dates and have to rely on a surrogate unique row number (RID below) instead:
proc sql ;
create table want as
select *
from (select *, monotonic() as rid from treatment)
group subid
having (rid < max (rid) and rid > min (rid)) or count(*) < 3
;
quit ;
This approach is generally safer since you while may think that your DATE values per SUBID are unique, the actual data may know otherwise. Finally, if your answer is (b), i.e. you don't want to keep the groups with fewer than 3 rows, then eliminate the text:
or count(*) < 3
from both queries shown above.
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.