ID Log_Code
11 A
12 A
12 B
13 A
14 A
14 B
I want to eliminate both rows that meet the criteria of having 'B' as a log code. So in this case I want to eliminate ID 12 and 14. How can I do this in a SAS datastep or proc sql
There are a few ways, but this is probably the most intuitive. It assumes your data are sorted by ID:
data want;
merge have
have (keep=id log_code where=(log_code='B') in=drop_these);
by id;
if drop_these then delete;
run;
It is conceivable you will need to ignore messages indicating a many-to-many merge.
The SQL coders here can show you a way that may be just as intuitive to you ... depends on whether SQL or DATA steps are a stronger area for you.
Good luck.
Maybe Im missing something but you should be able to just use a where statement in either case:
Data new;
set old;
where log_code ne "B";
run;
proc sql;
select *
from old
where log_code ne "B"
;
quit;
If the varying padded log_codes are how the data appears then you might have to use strip(log_code) to remove leading and trailing spaces.
Hope this helps!
EJ
There are a few ways, but this is probably the most intuitive. It assumes your data are sorted by ID:
data want;
merge have
have (keep=id log_code where=(log_code='B') in=drop_these);
by id;
if drop_these then delete;
run;
It is conceivable you will need to ignore messages indicating a many-to-many merge.
The SQL coders here can show you a way that may be just as intuitive to you ... depends on whether SQL or DATA steps are a stronger area for you.
Good luck.
SQL solution -
data have;
input id $3. log_code $3.;
datalines;
11 A
12 A
12 B
13 A
14 A
14 B
run;
proc sql;
create table want as
select *
from have
where id not in (select distinct id
from have
where log_code in ('B'));
quit;
You can also simply do:
proc sql;
create table want as
select * from have group by id having sum(log_code='B') <1; quit;
Haikuo
Like EJ, I'm having trouble understanding what needs done here. I know it's been answered multiple times now, but out of curiosity, I'd like to understand what's being asked here exactly.
As far as I can tell, we simply want to exclude anything with log_code = 'B'? Correct? Is that the case?
No, it's more complex. If log_code='B', delete all observations for that ID, including those that have log_code='A'.
Ok. I see. Now it makes sense. Thanks.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.