Solved: How to delete a group of observations based on below conditions

Spintu · Posted 03-31-2022 10:21 AM

How to delete a group of observations based on below conditions
/*if birthnum 2 and 3 or 2 and 6*/

/*How to delete a group of observations based on below conditions*/
/*if birthnum 2 and 3 or 2 and 6*/
 
data have;
  input id case_id birthnum $;
  datalines;
1 695 1
1 698 2
1 699 3
2 695 B
2 698 2
2 699 5
3 91  B
3 698 2
3 695 B
3 697 6
4 695 2
4 698 5
4 699 B
;

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where= (birthnum in ("2" "3" "6")))');
      h1.defineKey('id');
      h1.defineDone();
    end;
  set have;
  if h1.check()=0 then delete;
run;

proc print data=want;
run;


The output I want: However I am not getting this. It's zero Records am getting. 
2 695 B
2 698 2
2 699 5
4 695 2
4 698 5
4 699 B

mkeintz · Posted 03-31-2022 10:32 AM

Given the data are already sorted by ID, this is a single data step task. Merge the subset with birthnum='2' with the subset having birtnum='3' or '6' with the entire set. Keep those observations that don't have at least one member in both of the subsets:

data have;
  input id case_id birthnum $;
  datalines;
1 695 1
1 698 2
1 699 3
2 695 B
2 698 2
2 699 5
3 91  B
3 698 2
3 695 B
3 697 6
4 695 2
4 698 5
4 699 B
;
data want;
  merge have (where=(birthnum='2') in=in2)
        have (where=(birthnum='3' or birthnum='6') in=in3or6)
        have ;
  by id;
  if not (in2=1 and in3or6=1);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

tarheel13 · Posted 03-31-2022 10:27 AM

have you tested your code? I reran it making check=h1.check(). check=0 for all of them. that's why you are not returning any observations.

Spintu · Posted 03-31-2022 10:49 AM

If I am writing check=h1.check() . It's creating a new variable check having missing value. I am not getting my expected output.

tarheel13 · Posted 03-31-2022 10:51 AM

I suggested doing that so you could see the values are all 0 and that's why they got deleted. you should always open up datasets and see if you got desired results. it's a good way to check your work.

AMSAS · Posted 03-31-2022 10:27 AM

I think you are going to need to explain this more clearly

/*if birthnum 2 and 3 or 2 and 6*/

How can a single observation (row) have a variable (birthnum) with values 2 and 3

tarheel13 · Posted 03-31-2022 10:32 AM

if you look at the data ID 1 and 3 both have birthnum=2 and 3 or birthnum=2 and 6. I think the OP wishes to delete ones that ever have birthnum=2 and birthnum=3 or birthnum=2 and birthnum=6. that's the way I interpreted it from OP's post and desired output.

mkeintz · Posted 03-31-2022 10:32 AM

Given the data are already sorted by ID, this is a single data step task. Merge the subset with birthnum='2' with the subset having birtnum='3' or '6' with the entire set. Keep those observations that don't have at least one member in both of the subsets:

data have;
  input id case_id birthnum $;
  datalines;
1 695 1
1 698 2
1 699 3
2 695 B
2 698 2
2 699 5
3 91  B
3 698 2
3 695 B
3 697 6
4 695 2
4 698 5
4 699 B
;
data want;
  merge have (where=(birthnum='2') in=in2)
        have (where=(birthnum='3' or birthnum='6') in=in3or6)
        have ;
  by id;
  if not (in2=1 and in3or6=1);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

tarheel13 · Posted 03-31-2022 10:47 AM

This is an elegant solution with not too many lines of code!

Ksharp · Posted 04-01-2022 09:28 AM

 
data have;
  input id case_id birthnum $;
  datalines;
1 695 1
1 698 2
1 699 3
2 695 B
2 698 2
2 699 5
3 91  B
3 698 2
3 695 B
3 697 6
4 695 2
4 698 5
4 699 B
;


proc sql;
create table want as
select * from have
 group by id
  having not (sum(birthnum='2') and sum(birthnum in ('3' '6')));
quit;

Spintu · Posted 04-01-2022 01:15 PM

Thank you! Good to know that.

How to delete a group of observations based on below conditions

Re: How to delete a group of observations based on below conditions

Re: How to delete a group of observations based on below conditions

Re: How to delete a group of observations based on below conditions

Re: How to delete a group of observations based on below conditions

Re: How to delete a group of observations based on below conditions

Re: How to delete a group of observations based on below conditions

Re: How to delete a group of observations based on below conditions

Re: How to delete a group of observations based on below conditions

Re: How to delete a group of observations based on below conditions

Re: How to delete a group of observations based on below conditions

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away