here is the the structure of my data below:
i only need ID 2 since all patients are alive, therefore trying to delete id1:
Id sex status
1 2 A
1 2 A
1 2 A
1 2 D
2 1 A
2 1 A
2 1 A
It will sum the values from the expression "status='A'" by ID and compare that to the number of rows, so it will keep the ones where all the rows with that ID have status='A'. This is nifty usage of the group by and having clause. To see what is going on, like when debugging, you could "name" the flag and view an "intermediate" table:
data tbl;
input id sex status $;
datalines;
1 2 A
1 2 R
1 2 A
1 2 D
2 1 A
2 1 A
2 1 A
;
/* Create intermediate table to see the flag */
proc sql;
create table have as
select *, status='A' as alive_flag 'alive flag'
from tbl
;
select * from have
;
create table want as
select * from have
group by id
having sum(alive_flag)=count(*)
;
select * from want;
/*
alive
id sex status flag
-------------------
1 2 A 1
1 2 R 0
1 2 A 1
1 2 D 0
2 1 A 1
2 1 A 1
2 1 A 1
alive
id sex status flag
-------------------
2 1 A 1
2 1 A 1
2 1 A 1
*/
The simplest program might be this. Assuming your data set is already sorted by ID:
data want;
merge have (where=(status='D') in=delete_me) have;
by ID;
if delete_me then delete;
run;
Note that this deletes IDs that contain a "D" only. If you want to delete anything but "A" you would replace the WHERE condition:
where=(status ne 'A')
While simplest, this approach will unfortunately give you some undesirable messages in the log. The messages won't hurt in this particular case, but could under other circumstances. So a more complex, cleaner method would be:
data want;
wanted='Y';;
do until (last.id);
set have;
by ID;
if status='D' then wanted='N';
end;
do until (last.id);
set have;
by ID;
if wanted='Y' then output;
end;
run;
As before, you could conceivably change the conditions:
if status ne 'A' then wanted='N';
I appreciate your prompt response.
I was using the first method and in fact if was only omiting obs where id='d'; when my goal is to remove all obs with that id; ie delete id 1 all together and ending with id 2 only.
I usually create a "discard" table before deleting from the data. I'm assuming that 'A' is for Alive and 'D' is for Dead, but what is 'R'? Anyway, the discard table below is populated by any rows where the subject is not 'A', so we can delete any ID from the original table that appears in the discard table.
proc sql;
create table discard as
select *
from tbl
where status ^= 'A'
;
delete from tbl
where id in (select id from discard);
R is for retransplant 🙂 thank you for your suggestion.
data have;
input Id sex status $;
cards;
1 2 A
1 2 A
1 2 A
1 2 D
2 1 A
2 1 A
2 1 A
;
run;
proc sql;
create table want as
select * from have
group by id
having sum(status='A')=count(*);
quit;
Could you please tell me how the following 'having' clause will work in below proc sql? what variable it will sum?
proc sql;
create table want as
select * from have
group by id
having sum(status='A')=count(*);
quit;
Hi @Babloo,
This SUM function adds Boolean values, more precisely: their numeric equivalents 1 (for TRUE) and 0 (for FALSE). These arise from the equation status='A'. So, an observation with STATUS='A' contributes a 1 to the sum and other observations a 0. Thus, sum(status='A') is the number of observations with STATUS='A' in the BY group (due to the GROUP BY clause). If this number is equal to count(*), the number of all observations in the BY group, this means that all observations in the BY group must have STATUS='A'.
It will sum the values from the expression "status='A'" by ID and compare that to the number of rows, so it will keep the ones where all the rows with that ID have status='A'. This is nifty usage of the group by and having clause. To see what is going on, like when debugging, you could "name" the flag and view an "intermediate" table:
data tbl;
input id sex status $;
datalines;
1 2 A
1 2 R
1 2 A
1 2 D
2 1 A
2 1 A
2 1 A
;
/* Create intermediate table to see the flag */
proc sql;
create table have as
select *, status='A' as alive_flag 'alive flag'
from tbl
;
select * from have
;
create table want as
select * from have
group by id
having sum(alive_flag)=count(*)
;
select * from want;
/*
alive
id sex status flag
-------------------
1 2 A 1
1 2 R 0
1 2 A 1
1 2 D 0
2 1 A 1
2 1 A 1
2 1 A 1
alive
id sex status flag
-------------------
2 1 A 1
2 1 A 1
2 1 A 1
*/
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.