ID Log_Code
11 A
12 A
12 B
13 A
14 A
14 B
I want to eliminate both rows that meet the criteria of having 'B' as a log code. So in this case I want to eliminate ID 12 and 14. How can I do this in a SAS datastep or proc sql
There are a few ways, but this is probably the most intuitive. It assumes your data are sorted by ID:
data want;
merge have
have (keep=id log_code where=(log_code='B') in=drop_these);
by id;
if drop_these then delete;
run;
It is conceivable you will need to ignore messages indicating a many-to-many merge.
The SQL coders here can show you a way that may be just as intuitive to you ... depends on whether SQL or DATA steps are a stronger area for you.
Good luck.
Maybe Im missing something but you should be able to just use a where statement in either case:
Data new;
set old;
where log_code ne "B";
run;
proc sql;
select *
from old
where log_code ne "B"
;
quit;
If the varying padded log_codes are how the data appears then you might have to use strip(log_code) to remove leading and trailing spaces.
Hope this helps!
EJ
There are a few ways, but this is probably the most intuitive. It assumes your data are sorted by ID:
data want;
merge have
have (keep=id log_code where=(log_code='B') in=drop_these);
by id;
if drop_these then delete;
run;
It is conceivable you will need to ignore messages indicating a many-to-many merge.
The SQL coders here can show you a way that may be just as intuitive to you ... depends on whether SQL or DATA steps are a stronger area for you.
Good luck.
SQL solution -
data have;
input id $3. log_code $3.;
datalines;
11 A
12 A
12 B
13 A
14 A
14 B
run;
proc sql;
create table want as
select *
from have
where id not in (select distinct id
from have
where log_code in ('B'));
quit;
You can also simply do:
proc sql;
create table want as
select * from have group by id having sum(log_code='B') <1; quit;
Haikuo
Like EJ, I'm having trouble understanding what needs done here. I know it's been answered multiple times now, but out of curiosity, I'd like to understand what's being asked here exactly.
As far as I can tell, we simply want to exclude anything with log_code = 'B'? Correct? Is that the case?
No, it's more complex. If log_code='B', delete all observations for that ID, including those that have log_code='A'.
Ok. I see. Now it makes sense. Thanks.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.