This is a typical problem that can be encountered in panel data when I want to exclude certain subjects who don't meet my inclusion criteria. For example, from the table below
| year | id | v1 | v2 | v3 |
| 1 | 1 | 1 | 23 | 0 |
| 2 | 1 | 2 | 23 | 0 |
| 3 | 1 | 3 | 23 | 1 |
| 1 | 2 | 1 | 14 | 0 |
| 2 | 2 | 3 | 23 | 0 |
| 3 | 2 | 2 | 23 | 0 |
| 1 | 3 | 2 | 25 | 0 |
| 2 | 3 | 2 | 2 | 0 |
| 3 | 3 | 1 | 3 | 0 |
One of my criteria is that a subject that v3=1 shouldn't be included. Thus, I want to eliminate all observations from id 1 (year1-3). Likewise, I have another condition that a subject who shows decrease in v2 should not be included. Then, all observations from id 3 should be also eliminated. As a result, the table I want to see is.
| year | id | v1 | v2 | v3 |
| 1 | 2 | 1 | 14 | 0 |
| 2 | 2 | 3 | 23 | 0 |
| 3 | 2 | 2 | 23 | 0 |
What codes can I use for handling this problem?
data have;
input year id v1 v2 v3;
cards;
1 1 1 23 0
2 1 2 23 0
3 1 3 23 1
1 2 1 14 0
2 2 3 23 0
3 2 2 23 0
1 3 2 25 0
2 3 2 2 0
3 3 1 3 0
;
data _null_;
if _n_=1 then do;
dcl hash H (dataset:'have', multidata:'y') ;
h.definekey ("id") ;
h.definedata (all:'y') ;
h.definedone () ;
end;
set have end=lr;
by id ;
if first.id then call missing(v3_check);
if v3=1 then v3_check=1;
if not first.id and lag(v2)>v2 or last.id and v3_check=1 then rc=h.remove();
if lr then h.output(dataset:'want');
run;
SQL is pretty good at this one.
proc sql;
create table filtered as
select *
from have where id not in (select distinct ID from have where v3=1);
quit;
@asinusdk wrote:
This is a typical problem that can be encountered in panel data when I want to exclude certain subjects who don't meet my inclusion criteria. For example, from the table below
year id v1 v2 v3 1 1 1 23 0 2 1 2 23 0 3 1 3 23 1 1 2 1 14 0 2 2 3 23 0 3 2 2 23 0 1 3 2 25 0 2 3 2 2 0 3 3 1 3 0 One of my criteria is that a subject that v3=1 shouldn't be included. Thus, I want to eliminate all observations from id 1 (year1-3). Likewise, I have another condition that a subject who shows decrease in v2 should not be included. Then, all observations from id 3 should be also eliminated. As a result, the table I want to see is.
year id v1 v2 v3
1 2 1 14 0 2 2 3 23 0 3 2 2 23 0
What codes can I use for handling this problem?
I tried this code, but ERROR: Expression using equals (=) has components that are of different data types was shown.
Maybe is the code different in case v3 is a character variable?
Oh! I found that I typed quotation mark for the numeric variable. That's why the error msg came up.
But still I have a question.
What if the condition is more than one?
For example, can I get rid of all the subjects if year=1 and v3=1?
Sorry for a lot of questions..
" who shows decrease in v2 "
it then increased to 3?
I meant that even though it increased later, if the number decreased at least once, then the subject should be eliminated.
Maybe it's not the case one rule can be applied.... I guess.
data have;
input year id v1 v2 v3;
cards;
1 1 1 23 0
2 1 2 23 0
3 1 3 23 1
1 2 1 14 0
2 2 3 23 0
3 2 2 23 0
1 3 2 25 0
2 3 2 2 0
3 3 1 3 0
;
data _null_;
if _n_=1 then do;
dcl hash H (dataset:'have', multidata:'y') ;
h.definekey ("id") ;
h.definedata (all:'y') ;
h.definedone () ;
end;
set have end=lr;
by id ;
if first.id then call missing(v3_check);
if v3=1 then v3_check=1;
if not first.id and lag(v2)>v2 or last.id and v3_check=1 then rc=h.remove();
if lr then h.output(dataset:'want');
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.