BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
asgee
Obsidian | Level 7

Hi all,

 

I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

CCC

1

17

DDD

1

87

DDD

2

.

DDD

3

13

EEE

1

.

EEE

2

.

FFF

1

18

GGG

1

31

HHH

1

.

HHH

2

.

HHH

3

25

 

At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:

 

 

data want;
     do _n_=1 by 1 until(last.id);
	   set have;
       by id;
       if (data ne .) then _count=sum(_count,1);
	   else _count=0;
	   if _count>=2 then _flag=1;
	   end;
	 do _n_=1 to _n_;
	   set have;
	   if _flag then output;
	   end;
	drop _:;
run;

 

 

In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below: 

 

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

DDD

1

87

DDD

2

.

DDD

3

13

 

Any help would be much appreciated! 

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I think you can just use the HAVING clause in PROC SQL. Does the following work for you? If not, please post the log.

 

proc sql;

create table want as 

select *

from have

group by ID

having count(data)>2;

quit;

@asgee wrote:

Hi all,

 

I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

CCC

1

17

DDD

1

87

DDD

2

.

DDD

3

13

EEE

1

.

EEE

2

.

FFF

1

18

GGG

1

31

HHH

1

.

HHH

2

.

HHH

3

XY

 

At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:

 

 

data want;
     do _n_=1 by 1 until(last.id);
	   set have;
       by id;
       if (data ne .) then _count=sum(_count,1);
	   else _count=0;
	   if _count>=2 then _flag=1;
	   end;
	 do _n_=1 to _n_;
	   set have;
	   if _flag then output;
	   end;
	drop _:;
run;

 

 

In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below: 

 

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

DDD

1

87

DDD

2

.

DDD

3

13

 

Any help would be much appreciated! 

 

Thanks


 

View solution in original post

5 REPLIES 5
Reeza
Super User

I think you can just use the HAVING clause in PROC SQL. Does the following work for you? If not, please post the log.

 

proc sql;

create table want as 

select *

from have

group by ID

having count(data)>2;

quit;

@asgee wrote:

Hi all,

 

I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

CCC

1

17

DDD

1

87

DDD

2

.

DDD

3

13

EEE

1

.

EEE

2

.

FFF

1

18

GGG

1

31

HHH

1

.

HHH

2

.

HHH

3

XY

 

At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:

 

 

data want;
     do _n_=1 by 1 until(last.id);
	   set have;
       by id;
       if (data ne .) then _count=sum(_count,1);
	   else _count=0;
	   if _count>=2 then _flag=1;
	   end;
	 do _n_=1 to _n_;
	   set have;
	   if _flag then output;
	   end;
	drop _:;
run;

 

 

In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below: 

 

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

DDD

1

87

DDD

2

.

DDD

3

13

 

Any help would be much appreciated! 

 

Thanks


 

asgee
Obsidian | Level 7
Hi @Reeza ! Thanks for the reply. Yes it seems like your code works perfectly! I never came across the "HAVING" clause so that's really good to know. Could there also be a "NOT HAVING" clause if I wanted to subset the opposite?
Reeza
Super User

@asgee wrote:
Hi @Reeza ! Thanks for the reply. Yes it seems like your code works perfectly! I never came across the "HAVING" clause so that's really good to know. Could there also be a "NOT HAVING" clause if I wanted to subset the opposite?

Try it and let us know 🙂

novinosrin
Tourmaline | Level 20

data have;
input ID $	Visit	Data $;
cards;
AAA	1	131
AAA	2	5
AAA	3	.
BBB	1	12
BBB	2	61
CCC	1	17
DDD	1	87
DDD	2	.
DDD	3	13
EEE	1	.
EEE	2	.
FFF	1	18
GGG	1	31
HHH	1	.
HHH	2	.
HHH	3	XY
;

proc sql;
 create table want as
 select *
 from have
 group by id
 having n(data)>=2
 order by id,visit;
quit;

/*or*/
data want;
 do _n_=1 by 1 until(last.id);
  set have;
  by id;
  _n=sum(^missing(data),_n);
 end;
 do _n_=1 to _n_;
  set have;
  if _n>=2 then output;
 end;
 drop _:;
run;
asgee
Obsidian | Level 7
Hi @novinosrin ! Thanks for your reply. I tried your code and it works as well! Will keep your solution in mind for other issues I have with repeated measures. Thanks again!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 604 views
  • 3 likes
  • 3 in conversation