Hi All,
I have a data set where every observation is repeated twice on consecutive days and I would like to delete the first observation. For example:
id group treatmentdate day month year
A1 0 30Sep2017 30 9 2017
A1 0 1Oct2017 1 10 2017
A2 1 06Nov2017 6 11 2017
A2 1 07Nov2017 7 11 2017
A1 0 23Oct2017 23 10 2017
A1 0 24Oct2017 24 10 2017
What I would like:
id group treatmentdate day month year
A1 0 01Oct2017 1 10 2017
A2 1 07Nov2017 7 11 2017
A1 0 24Oct2017 24 10 2017
I was attempting to use
proc sql;
create table want as
select *
from have
group by id group
having day=max(day)
run;
but this will not work as I have nothing more to group by to retain the same id from the same group with two different treatments (of two consecutive days).
Any suggestions please?
I have looked at using a DoW loop but haven't been successful.
If I understand what you describe and want then code like below should do the job.
data have;
input (id group) ($) treatmentdate:date9. day month year;
format treatmentdate date9.;
datalines;
A1 0 30Sep2017 30 9 2017
A1 0 1Oct2017 1 10 2017
A2 1 06Nov2017 6 11 2017
A2 1 07Nov2017 7 11 2017
A1 0 23Oct2017 23 10 2017
A1 0 24Oct2017 24 10 2017
;
proc sort data=have out=inter;
by id group DESCENDING treatmentdate;
run;
data want;
set inter;
by id group DESCENDING treatmentdate;
if dif1(treatmentdate)=-1 and not first.group then delete;
run;
NB: When using the DIF or LAG function make sure you use it in a place where it gets executed for every single row of your input data.
Given the condition that
every observation is repeated twice on consecutive days and I would like to delete the first observation
here's the solution:
data want;
set have;
if mod(_n_,2)=0;
run;
Take a look at Grouping Data and FIRST. and LAST. DATA Step Variables
That should do what you want
Every record will always have two entries? It also looks like you may have multiple entries for a single ID as A1 ID is repeated twice?
data want;
set have;
by ID group NOTSORTED;
if first.group;
run;
@EM_G wrote:
Hi All,
I have a data set where every observation is repeated twice on consecutive days and I would like to delete the first observation. For example:
id group treatmentdate day month year
A1 0 30Sep2017 30 9 2017
A1 0 1Oct2017 1 10 2017
A2 1 06Nov2017 6 11 2017
A2 1 07Nov2017 7 11 2017
A1 0 23Oct2017 23 10 2017
A1 0 24Oct2017 24 10 2017
What I would like:
id group treatmentdate day month year
A1 0 01Oct2017 1 10 2017
A2 1 07Nov2017 7 11 2017
A1 0 24Oct2017 24 10 2017
I was attempting to use
proc sql;
create table want as
select *
from have
group by id group
having day=max(day)
run;
but this will not work as I have nothing more to group by to retain the same id from the same group with two different treatments (of two consecutive days).
Any suggestions please?
I have looked at using a DoW loop but haven't been successful.
What if you have more than two entries for a given id/group combination? What if the dates are not consecutive? What if there's multiple obs for a single date?
Before we can come up with a reliable solution, this (and probably other) facts about your data need to be cleared up.
If I understand what you describe and want then code like below should do the job.
data have;
input (id group) ($) treatmentdate:date9. day month year;
format treatmentdate date9.;
datalines;
A1 0 30Sep2017 30 9 2017
A1 0 1Oct2017 1 10 2017
A2 1 06Nov2017 6 11 2017
A2 1 07Nov2017 7 11 2017
A1 0 23Oct2017 23 10 2017
A1 0 24Oct2017 24 10 2017
;
proc sort data=have out=inter;
by id group DESCENDING treatmentdate;
run;
data want;
set inter;
by id group DESCENDING treatmentdate;
if dif1(treatmentdate)=-1 and not first.group then delete;
run;
NB: When using the DIF or LAG function make sure you use it in a place where it gets executed for every single row of your input data.
@EM_G wrote:
Patrick,
Thank you so much, this has worked. Would you mind explaining the 'not first.group'? I have read about this in the sas documents but have not got my head around it. What did it prevent from happening?
Thanks again for your time.
The first and last syntax in a data step allows you to identify the first or last row in a tuple.
You only want to "de-dupe" within the same group for an id.
The syntax used compares the current treatment date to the treatment date of the previous obs also when this previous obs belonged to a different group. Nothing prevents this previous treatment date value to be by chance exactly one day earlier than the current one in the first obs of a new group . Using and not first.group ensures that the first obs of a new group always gets kept even if the treatment date comparison would return a -1.
Btw: Expression dif1(treatmentdate)=-1 must be the first element in the if condition to ensure it gets executed for every single row. If a logical expression gets evaluated to False then SAS won't necessarily execute later expression under the same if statement (i.e. if with two expressions connected via AND the first expression resolved to False then there is no need to even execute the second test).
Thank you for the explanation and your time.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.