- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can anyone help on this? For example, I want to delete all persons' records who have score of 2. Many thanks!
data new;
input ID Score;
datalines;
1 2
1 7
1 .
2 9
2 6
2 5
3 2
3 9
3 6
;
run;
The the new table will be:
ID | Score |
2 | 9 |
2 | 6 |
2 | 5 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want to have a table including all the people with the score of 2, then:
data score2;
set new;
where (score = 2);
run;
If you want to have the table as above where you want all the scores for the person with the ID of 2, then:
data ID2;
set new;
where (ID = 2);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There would be many way to do it, below just a few among them:
data new;
input ID Score;
datalines;
1 2
1 7
1 .
2 9
2 6
2 5
3 2
3 9
3 6
;
run;
/*sql*/
proc sql;
create table want as
select * from new group by id having sum(score=2)=0;
quit;
/*dow*/
data want;
do until (last.id);
set new;
by id notsorted;
if score=2 then flag=1;
end;
do until (last.id);
set new;
by id notsorted;
if flag ne 1 then output;
end;
drop flag;
run;
/*stack*/
data want;
retain flag;
set new (in=up) new;
by id;
if first.id then call missing(flag);
if score=2 then flag=1;
if not up and flag ne 1 then output;
drop flag;
run;
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You could try the following:
proc sql noprint;
create table want as
select *
from new
where id not in (select id
from new
where score=2)
;
quit;
Regards,
Amir
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your help, everyone!
Regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
if score=2 then delete;
would also work in a datastep.
I might not actually delete them but when I go to the next step in analysis I would remove them using a dataset option.
proc means data=new (where=(score ne 2));
for example.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You could also implement a hash table solution.
DATA NEW2(DROP=RC ) ;
DECLARE HASH AA(DATASET:' work.new(where=(score=2) ) ' ) ; /* only the offering IDs are ready into hash object */
RC=AA.DEFINEKEY( 'ID' ) ; RC=AA.DEFINEDONE() ;
DO UNTIL(EOF1) ;
SET WORK.NEW END=EOF1 ;
RC=AA.FIND() ;
IF RC NE 0 THEN OUTPUT ; /* RC=0 indicates a match, which you don't want in this instance */
END ;
STOP ;
RUN ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
All the hashing posts are forcing me to learn it. Here's a slight improvement, applicable if the data are already sorted:
do until (eof1);
set work.new end=eof1;
by id;
if first.id then rc = aa.find();
if rc ne 0 then output;
end;
Search the hash table just once per ID.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The beauty of the hash is it doesn't require the data to be pre-sorted. Had the original post entered the datalines in some non-sorted order, your suggestion would then have to incur a sort procedure in order to make the BY processing approach work and your overall run time increases, based on the additional I/O required by the sort.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Or just this way:
DATA NEW2(DROP=RC ) ;
if _n_=1 then
do;
DECLARE HASH AA(DATASET:' work.new(where=(score=2) ) ' ) ; /* only the offering IDs are ready into hash object */
RC=AA.DEFINEKEY( 'ID' ) ;
RC=AA.DEFINEDONE() ;
end;
set new;
if aa.check();
RUN ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I guess the more, the merrier. Here is an array() approach, with the assumption that data is presorted by state and each state only has less than 1000 obs.
/*array*/
data want;
array t(1:1000,1:2) _temporary_;
do _n_=1 by 1 until (last.state);
set new;
by state;
t(_n_, 1)=state;
t(_n_,2)=score;
if score=2 then flag=1;
end;
if flag ne 1 then do i=1 by 1 until (i=_n_);
state=t(i,1);
score=t(i,2);
output;
end;
call missing (of t(*));
drop flag i
run;
Haikuo