Can anyone help on this? For example, I want to delete all persons' records who have score of 2. Many thanks!
data new;
input ID Score;
datalines;
1 2
1 7
1 .
2 9
2 6
2 5
3 2
3 9
3 6
;
run;
The the new table will be:
ID | Score |
2 | 9 |
2 | 6 |
2 | 5 |
If you want to have a table including all the people with the score of 2, then:
data score2;
set new;
where (score = 2);
run;
If you want to have the table as above where you want all the scores for the person with the ID of 2, then:
data ID2;
set new;
where (ID = 2);
run;
There would be many way to do it, below just a few among them:
data new;
input ID Score;
datalines;
1 2
1 7
1 .
2 9
2 6
2 5
3 2
3 9
3 6
;
run;
/*sql*/
proc sql;
create table want as
select * from new group by id having sum(score=2)=0;
quit;
/*dow*/
data want;
do until (last.id);
set new;
by id notsorted;
if score=2 then flag=1;
end;
do until (last.id);
set new;
by id notsorted;
if flag ne 1 then output;
end;
drop flag;
run;
/*stack*/
data want;
retain flag;
set new (in=up) new;
by id;
if first.id then call missing(flag);
if score=2 then flag=1;
if not up and flag ne 1 then output;
drop flag;
run;
Haikuo
Hi,
You could try the following:
proc sql noprint;
create table want as
select *
from new
where id not in (select id
from new
where score=2)
;
quit;
Regards,
Amir
Thanks for your help, everyone!
Regards.
if score=2 then delete;
would also work in a datastep.
I might not actually delete them but when I go to the next step in analysis I would remove them using a dataset option.
proc means data=new (where=(score ne 2));
for example.
You could also implement a hash table solution.
DATA NEW2(DROP=RC ) ;
DECLARE HASH AA(DATASET:' work.new(where=(score=2) ) ' ) ; /* only the offering IDs are ready into hash object */
RC=AA.DEFINEKEY( 'ID' ) ; RC=AA.DEFINEDONE() ;
DO UNTIL(EOF1) ;
SET WORK.NEW END=EOF1 ;
RC=AA.FIND() ;
IF RC NE 0 THEN OUTPUT ; /* RC=0 indicates a match, which you don't want in this instance */
END ;
STOP ;
RUN ;
All the hashing posts are forcing me to learn it. Here's a slight improvement, applicable if the data are already sorted:
do until (eof1);
set work.new end=eof1;
by id;
if first.id then rc = aa.find();
if rc ne 0 then output;
end;
Search the hash table just once per ID.
The beauty of the hash is it doesn't require the data to be pre-sorted. Had the original post entered the datalines in some non-sorted order, your suggestion would then have to incur a sort procedure in order to make the BY processing approach work and your overall run time increases, based on the additional I/O required by the sort.
Or just this way:
DATA NEW2(DROP=RC ) ;
if _n_=1 then
do;
DECLARE HASH AA(DATASET:' work.new(where=(score=2) ) ' ) ; /* only the offering IDs are ready into hash object */
RC=AA.DEFINEKEY( 'ID' ) ;
RC=AA.DEFINEDONE() ;
end;
set new;
if aa.check();
RUN ;
I guess the more, the merrier. Here is an array() approach, with the assumption that data is presorted by state and each state only has less than 1000 obs.
/*array*/
data want;
array t(1:1000,1:2) _temporary_;
do _n_=1 by 1 until (last.state);
set new;
by state;
t(_n_, 1)=state;
t(_n_,2)=score;
if score=2 then flag=1;
end;
if flag ne 1 then do i=1 by 1 until (i=_n_);
state=t(i,1);
score=t(i,2);
output;
end;
call missing (of t(*));
drop flag i
run;
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.