- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:
ID |
Visit |
Data |
AAA |
1 |
131 |
AAA |
2 |
5 |
AAA |
3 |
. |
BBB |
1 |
12 |
BBB |
2 |
61 |
CCC |
1 |
17 |
DDD |
1 |
87 |
DDD |
2 |
. |
DDD |
3 |
13 |
EEE |
1 |
. |
EEE |
2 |
. |
FFF |
1 |
18 |
GGG |
1 |
31 |
HHH |
1 |
. |
HHH |
2 |
. |
HHH |
3 |
25 |
At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
if (data ne .) then _count=sum(_count,1);
else _count=0;
if _count>=2 then _flag=1;
end;
do _n_=1 to _n_;
set have;
if _flag then output;
end;
drop _:;
run;
In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below:
ID |
Visit |
Data |
AAA |
1 |
131 |
AAA |
2 |
5 |
AAA |
3 |
. |
BBB |
1 |
12 |
BBB |
2 |
61 |
DDD |
1 |
87 |
DDD |
2 |
. |
DDD |
3 |
13 |
Any help would be much appreciated!
Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you can just use the HAVING clause in PROC SQL. Does the following work for you? If not, please post the log.
proc sql;
create table want as
select *
from have
group by ID
having count(data)>2;
quit;
@asgee wrote:
Hi all,
I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:
ID
Visit
Data
AAA
1
131
AAA
2
5
AAA
3
.
BBB
1
12
BBB
2
61
CCC
1
17
DDD
1
87
DDD
2
.
DDD
3
13
EEE
1
.
EEE
2
.
FFF
1
18
GGG
1
31
HHH
1
.
HHH
2
.
HHH
3
XY
At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:
data want; do _n_=1 by 1 until(last.id); set have; by id; if (data ne .) then _count=sum(_count,1); else _count=0; if _count>=2 then _flag=1; end; do _n_=1 to _n_; set have; if _flag then output; end; drop _:; run;
In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below:
ID
Visit
Data
AAA
1
131
AAA
2
5
AAA
3
.
BBB
1
12
BBB
2
61
DDD
1
87
DDD
2
.
DDD
3
13
Any help would be much appreciated!
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you can just use the HAVING clause in PROC SQL. Does the following work for you? If not, please post the log.
proc sql;
create table want as
select *
from have
group by ID
having count(data)>2;
quit;
@asgee wrote:
Hi all,
I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:
ID
Visit
Data
AAA
1
131
AAA
2
5
AAA
3
.
BBB
1
12
BBB
2
61
CCC
1
17
DDD
1
87
DDD
2
.
DDD
3
13
EEE
1
.
EEE
2
.
FFF
1
18
GGG
1
31
HHH
1
.
HHH
2
.
HHH
3
XY
At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:
data want; do _n_=1 by 1 until(last.id); set have; by id; if (data ne .) then _count=sum(_count,1); else _count=0; if _count>=2 then _flag=1; end; do _n_=1 to _n_; set have; if _flag then output; end; drop _:; run;
In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below:
ID
Visit
Data
AAA
1
131
AAA
2
5
AAA
3
.
BBB
1
12
BBB
2
61
DDD
1
87
DDD
2
.
DDD
3
13
Any help would be much appreciated!
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input ID $ Visit Data $;
cards;
AAA 1 131
AAA 2 5
AAA 3 .
BBB 1 12
BBB 2 61
CCC 1 17
DDD 1 87
DDD 2 .
DDD 3 13
EEE 1 .
EEE 2 .
FFF 1 18
GGG 1 31
HHH 1 .
HHH 2 .
HHH 3 XY
;
proc sql;
create table want as
select *
from have
group by id
having n(data)>=2
order by id,visit;
quit;
/*or*/
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
_n=sum(^missing(data),_n);
end;
do _n_=1 to _n_;
set have;
if _n>=2 then output;
end;
drop _:;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content