This is a typical problem that can be encountered in panel data when I want to exclude certain subjects who don't meet my inclusion criteria. For example, from the table below
year | id | v1 | v2 | v3 |
1 | 1 | 1 | 23 | 0 |
2 | 1 | 2 | 23 | 0 |
3 | 1 | 3 | 23 | 1 |
1 | 2 | 1 | 14 | 0 |
2 | 2 | 3 | 23 | 0 |
3 | 2 | 2 | 23 | 0 |
1 | 3 | 2 | 25 | 0 |
2 | 3 | 2 | 2 | 0 |
3 | 3 | 1 | 3 | 0 |
One of my criteria is that a subject that v3=1 shouldn't be included. Thus, I want to eliminate all observations from id 1 (year1-3). Likewise, I have another condition that a subject who shows decrease in v2 should not be included. Then, all observations from id 3 should be also eliminated. As a result, the table I want to see is.
year | id | v1 | v2 | v3 |
1 | 2 | 1 | 14 | 0 |
2 | 2 | 3 | 23 | 0 |
3 | 2 | 2 | 23 | 0 |
What codes can I use for handling this problem?
data have;
input year id v1 v2 v3;
cards;
1 1 1 23 0
2 1 2 23 0
3 1 3 23 1
1 2 1 14 0
2 2 3 23 0
3 2 2 23 0
1 3 2 25 0
2 3 2 2 0
3 3 1 3 0
;
data _null_;
if _n_=1 then do;
dcl hash H (dataset:'have', multidata:'y') ;
h.definekey ("id") ;
h.definedata (all:'y') ;
h.definedone () ;
end;
set have end=lr;
by id ;
if first.id then call missing(v3_check);
if v3=1 then v3_check=1;
if not first.id and lag(v2)>v2 or last.id and v3_check=1 then rc=h.remove();
if lr then h.output(dataset:'want');
run;
SQL is pretty good at this one.
proc sql;
create table filtered as
select *
from have where id not in (select distinct ID from have where v3=1);
quit;
@asinusdk wrote:
This is a typical problem that can be encountered in panel data when I want to exclude certain subjects who don't meet my inclusion criteria. For example, from the table below
year id v1 v2 v3 1 1 1 23 0 2 1 2 23 0 3 1 3 23 1 1 2 1 14 0 2 2 3 23 0 3 2 2 23 0 1 3 2 25 0 2 3 2 2 0 3 3 1 3 0 One of my criteria is that a subject that v3=1 shouldn't be included. Thus, I want to eliminate all observations from id 1 (year1-3). Likewise, I have another condition that a subject who shows decrease in v2 should not be included. Then, all observations from id 3 should be also eliminated. As a result, the table I want to see is.
year id v1 v2 v3
1 2 1 14 0 2 2 3 23 0 3 2 2 23 0
What codes can I use for handling this problem?
I tried this code, but ERROR: Expression using equals (=) has components that are of different data types was shown.
Maybe is the code different in case v3 is a character variable?
Oh! I found that I typed quotation mark for the numeric variable. That's why the error msg came up.
But still I have a question.
What if the condition is more than one?
For example, can I get rid of all the subjects if year=1 and v3=1?
Sorry for a lot of questions..
" who shows decrease in v2 "
it then increased to 3?
I meant that even though it increased later, if the number decreased at least once, then the subject should be eliminated.
Maybe it's not the case one rule can be applied.... I guess.
data have;
input year id v1 v2 v3;
cards;
1 1 1 23 0
2 1 2 23 0
3 1 3 23 1
1 2 1 14 0
2 2 3 23 0
3 2 2 23 0
1 3 2 25 0
2 3 2 2 0
3 3 1 3 0
;
data _null_;
if _n_=1 then do;
dcl hash H (dataset:'have', multidata:'y') ;
h.definekey ("id") ;
h.definedata (all:'y') ;
h.definedone () ;
end;
set have end=lr;
by id ;
if first.id then call missing(v3_check);
if v3=1 then v3_check=1;
if not first.id and lag(v2)>v2 or last.id and v3_check=1 then rc=h.remove();
if lr then h.output(dataset:'want');
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.