Hi all,
I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:
ID |
Visit |
Data |
AAA |
1 |
131 |
AAA |
2 |
5 |
AAA |
3 |
. |
BBB |
1 |
12 |
BBB |
2 |
61 |
CCC |
1 |
17 |
DDD |
1 |
87 |
DDD |
2 |
. |
DDD |
3 |
13 |
EEE |
1 |
. |
EEE |
2 |
. |
FFF |
1 |
18 |
GGG |
1 |
31 |
HHH |
1 |
. |
HHH |
2 |
. |
HHH |
3 |
25 |
At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
if (data ne .) then _count=sum(_count,1);
else _count=0;
if _count>=2 then _flag=1;
end;
do _n_=1 to _n_;
set have;
if _flag then output;
end;
drop _:;
run;
In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below:
ID |
Visit |
Data |
AAA |
1 |
131 |
AAA |
2 |
5 |
AAA |
3 |
. |
BBB |
1 |
12 |
BBB |
2 |
61 |
DDD |
1 |
87 |
DDD |
2 |
. |
DDD |
3 |
13 |
Any help would be much appreciated!
Thanks
I think you can just use the HAVING clause in PROC SQL. Does the following work for you? If not, please post the log.
proc sql;
create table want as
select *
from have
group by ID
having count(data)>2;
quit;
@asgee wrote:
Hi all,
I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:
ID
Visit
Data
AAA
1
131
AAA
2
5
AAA
3
.
BBB
1
12
BBB
2
61
CCC
1
17
DDD
1
87
DDD
2
.
DDD
3
13
EEE
1
.
EEE
2
.
FFF
1
18
GGG
1
31
HHH
1
.
HHH
2
.
HHH
3
XY
At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:
data want; do _n_=1 by 1 until(last.id); set have; by id; if (data ne .) then _count=sum(_count,1); else _count=0; if _count>=2 then _flag=1; end; do _n_=1 to _n_; set have; if _flag then output; end; drop _:; run;
In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below:
ID
Visit
Data
AAA
1
131
AAA
2
5
AAA
3
.
BBB
1
12
BBB
2
61
DDD
1
87
DDD
2
.
DDD
3
13
Any help would be much appreciated!
Thanks
I think you can just use the HAVING clause in PROC SQL. Does the following work for you? If not, please post the log.
proc sql;
create table want as
select *
from have
group by ID
having count(data)>2;
quit;
@asgee wrote:
Hi all,
I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:
ID
Visit
Data
AAA
1
131
AAA
2
5
AAA
3
.
BBB
1
12
BBB
2
61
CCC
1
17
DDD
1
87
DDD
2
.
DDD
3
13
EEE
1
.
EEE
2
.
FFF
1
18
GGG
1
31
HHH
1
.
HHH
2
.
HHH
3
XY
At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:
data want; do _n_=1 by 1 until(last.id); set have; by id; if (data ne .) then _count=sum(_count,1); else _count=0; if _count>=2 then _flag=1; end; do _n_=1 to _n_; set have; if _flag then output; end; drop _:; run;
In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below:
ID
Visit
Data
AAA
1
131
AAA
2
5
AAA
3
.
BBB
1
12
BBB
2
61
DDD
1
87
DDD
2
.
DDD
3
13
Any help would be much appreciated!
Thanks
data have;
input ID $ Visit Data $;
cards;
AAA 1 131
AAA 2 5
AAA 3 .
BBB 1 12
BBB 2 61
CCC 1 17
DDD 1 87
DDD 2 .
DDD 3 13
EEE 1 .
EEE 2 .
FFF 1 18
GGG 1 31
HHH 1 .
HHH 2 .
HHH 3 XY
;
proc sql;
create table want as
select *
from have
group by id
having n(data)>=2
order by id,visit;
quit;
/*or*/
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
_n=sum(^missing(data),_n);
end;
do _n_=1 to _n_;
set have;
if _n>=2 then output;
end;
drop _:;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.