DATA Step, Macro, Functions and more

delete sets of observations

Reply
N/A
Posts: 0

delete sets of observations

I would like to delete certain small subsets of my data.

For example say I have a set like this:
A B C D
1 0 2 3
1 10 2 2
1 4 3 2
2 2 3 3
2 6 7 3
2 3 3 2
2 7 3 2
3 1 0 3
3 5 3 2
3 7 1 3

I would like to delete the sets of observations with matching A values for which the there is an observation with the B value equal to zero. So it would look like this:

A B C D
2 2 3 3
2 6 7 3
2 3 3 2
2 7 3 2
3 1 0 3
3 5 3 2
3 7 1 3


something like

data data_set;
set data_set;
if B='0' then ??????;
run;

Thanks for the help.
N/A
Posts: 0

Re: delete sets of observations

Something like:

*flag the observations with a B=0;
data flags;
set input;
if b=0 then flag=1;
else flag=0;
run;

*keep all the observations without a flagged observation in their group;
proc sql;
create table no_zeros as
select *, max(flag) as zeros
from flags
group by A
having calculated zeros=0;
quit;

I think that will do it. Correct me if I'm wrong.
Super Contributor
Posts: 359

Re: delete sets of observations

proc sql;
delete from data_set where a in (select distinct a from data_set where b = 0);
quit;
N/A
Posts: 0

Re: delete sets of observations

I tried this and received the error below. I will try the first example now. Thanks.

15 proc sql;
116 delete from aptest where date and racenum in (select distinct date and racenum from aptest where
116! lifestarts = 0);
WARNING: This DELETE/INSERT statement recursively references the target table. A consequence of this
is a possible data integrity problem.
NOTE: 3071 rows were deleted from WORK.APTEST.


but there still remain observations for which lifestarts=0.
N/A
Posts: 0

Re: delete sets of observations

also received this error for the first attempt:

140 proc sql;
141 create table no_zeros as
142 select *, max(flag) as zeros
143 from flags
144 group by date and racenum
145 having calculated zeros=0;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.NO_ZEROS created, with 0 rows and 1479 columns.
N/A
Posts: 0

Re: delete sets of observations

140 proc sql;
141 create table no_zeros as
142 select *, max(flag) as zeros
143 from flags
144 group by date and racenum
145 having calculated zeros=0;

Firstly it's not an ERROR, it's a note. However you need to do

group by date, racenum

not

group by date and racenum

Flip's version is a much better one. You just need to alter it slightly. Instead of deleting records, just keep them.

proc sql;
create table no_zeros as
select * from your_dataset
where a not in (select distinct a from your_dataset where b=0);
quit;
N/A
Posts: 0

Re: delete sets of observations

Sorry for the error comment, just ment it wasnt doing what we wanted. Pznew, your code will do the job now, but if the other is more efficient I would like to give it a shot.


proc sql;
create table no_zeros as
select * from aptest
where date and racenum not in (select distinct date and racenum from aptest where lifestarts = 0);
quit;

This still does not seem to work. It just deleted the first racenum of each date. I think it is selecting a distinct date and then racenum=1 is the first racenum it comes to so it deltes it.


I have not used proc sql so I am kind of lost. The date, racenum fix from above doesnt work in this setting.
Super Contributor
Posts: 359

Re: delete sets of observations

Try where date || racenum not in (select distinct date || racenum from aptest where lifestarts = 0);

SQL will often give warnings, but once you understand what is happening, you can usually tell if they are a problem. Of course if you are validating the system you may need to code so that you do not get them. ie. creating a new data set or using a work dataset to do the selection.
N/A
Posts: 0

Re: delete sets of observations

283 proc sql;
284 create table no_zeros as
285 select * from aptest
286 where date || racenum not in (select distinct date || racenum from aptest where lifestarts = 0);
ERROR: Concatenation (||) requires character operands.
ERROR: Concatenation (||) requires character operands.
Super Contributor
Posts: 359

Re: delete sets of observations

Sorry;

how about:
create table no_zeros as select a.* from aptest as a, (select a, c from aptest where b=0) as b
where not(a.date = b.date and b.racenum = a.racenum);
N/A
Posts: 0

Re: delete sets of observations

Got it, Thanks!
Ask a Question
Discussion stats
  • 10 replies
  • 210 views
  • 0 likes
  • 2 in conversation