DATA Step, Macro, Functions and more

delete sets of observations

Reply
N/A
Posts: 0

delete sets of observations

I would like to delete certain small subsets of my data.

For example say I have a set like this:
A B C D
1 0 2 3
1 10 2 2
1 4 3 2
2 2 3 3
2 6 7 3
2 3 3 2
2 7 3 2
3 1 0 3
3 5 3 2
3 7 1 3

I would like to delete the sets of observations with matching A values for which the there is an observation with the B value equal to zero. So it would look like this:

A B C D
2 2 3 3
2 6 7 3
2 3 3 2
2 7 3 2
3 1 0 3
3 5 3 2
3 7 1 3


something like

data data_set;
set data_set;
if B='0' then ??????;
run;

Thanks for the help.
N/A
Posts: 0

Re: delete sets of observations

Posted in reply to deleted_user
Something like:

*flag the observations with a B=0;
data flags;
set input;
if b=0 then flag=1;
else flag=0;
run;

*keep all the observations without a flagged observation in their group;
proc sql;
create table no_zeros as
select *, max(flag) as zeros
from flags
group by A
having calculated zeros=0;
quit;

I think that will do it. Correct me if I'm wrong.
Super Contributor
Posts: 359

Re: delete sets of observations

Posted in reply to deleted_user
proc sql;
delete from data_set where a in (select distinct a from data_set where b = 0);
quit;
N/A
Posts: 0

Re: delete sets of observations

I tried this and received the error below. I will try the first example now. Thanks.

15 proc sql;
116 delete from aptest where date and racenum in (select distinct date and racenum from aptest where
116! lifestarts = 0);
WARNING: This DELETE/INSERT statement recursively references the target table. A consequence of this
is a possible data integrity problem.
NOTE: 3071 rows were deleted from WORK.APTEST.


but there still remain observations for which lifestarts=0.
N/A
Posts: 0

Re: delete sets of observations

Posted in reply to deleted_user
also received this error for the first attempt:

140 proc sql;
141 create table no_zeros as
142 select *, max(flag) as zeros
143 from flags
144 group by date and racenum
145 having calculated zeros=0;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.NO_ZEROS created, with 0 rows and 1479 columns.
N/A
Posts: 0

Re: delete sets of observations

Posted in reply to deleted_user
140 proc sql;
141 create table no_zeros as
142 select *, max(flag) as zeros
143 from flags
144 group by date and racenum
145 having calculated zeros=0;

Firstly it's not an ERROR, it's a note. However you need to do

group by date, racenum

not

group by date and racenum

Flip's version is a much better one. You just need to alter it slightly. Instead of deleting records, just keep them.

proc sql;
create table no_zeros as
select * from your_dataset
where a not in (select distinct a from your_dataset where b=0);
quit;
N/A
Posts: 0

Re: delete sets of observations

Posted in reply to deleted_user
Sorry for the error comment, just ment it wasnt doing what we wanted. Pznew, your code will do the job now, but if the other is more efficient I would like to give it a shot.


proc sql;
create table no_zeros as
select * from aptest
where date and racenum not in (select distinct date and racenum from aptest where lifestarts = 0);
quit;

This still does not seem to work. It just deleted the first racenum of each date. I think it is selecting a distinct date and then racenum=1 is the first racenum it comes to so it deltes it.


I have not used proc sql so I am kind of lost. The date, racenum fix from above doesnt work in this setting.
Super Contributor
Posts: 359

Re: delete sets of observations

Posted in reply to deleted_user
Try where date || racenum not in (select distinct date || racenum from aptest where lifestarts = 0);

SQL will often give warnings, but once you understand what is happening, you can usually tell if they are a problem. Of course if you are validating the system you may need to code so that you do not get them. ie. creating a new data set or using a work dataset to do the selection.
N/A
Posts: 0

Re: delete sets of observations

283 proc sql;
284 create table no_zeros as
285 select * from aptest
286 where date || racenum not in (select distinct date || racenum from aptest where lifestarts = 0);
ERROR: Concatenation (||) requires character operands.
ERROR: Concatenation (||) requires character operands.
Super Contributor
Posts: 359

Re: delete sets of observations

Posted in reply to deleted_user
Sorry;

how about:
create table no_zeros as select a.* from aptest as a, (select a, c from aptest where b=0) as b
where not(a.date = b.date and b.racenum = a.racenum);
N/A
Posts: 0

Re: delete sets of observations

Got it, Thanks!
Ask a Question
Discussion stats
  • 10 replies
  • 217 views
  • 0 likes
  • 2 in conversation