SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
asgee
Obsidian | Level 7

Hi all,

 

I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

CCC

1

17

DDD

1

87

DDD

2

.

DDD

3

13

EEE

1

.

EEE

2

.

FFF

1

18

GGG

1

31

HHH

1

.

HHH

2

.

HHH

3

25

 

At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:

 

 

data want;
     do _n_=1 by 1 until(last.id);
	   set have;
       by id;
       if (data ne .) then _count=sum(_count,1);
	   else _count=0;
	   if _count>=2 then _flag=1;
	   end;
	 do _n_=1 to _n_;
	   set have;
	   if _flag then output;
	   end;
	drop _:;
run;

 

 

In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below: 

 

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

DDD

1

87

DDD

2

.

DDD

3

13

 

Any help would be much appreciated! 

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I think you can just use the HAVING clause in PROC SQL. Does the following work for you? If not, please post the log.

 

proc sql;

create table want as 

select *

from have

group by ID

having count(data)>2;

quit;

@asgee wrote:

Hi all,

 

I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

CCC

1

17

DDD

1

87

DDD

2

.

DDD

3

13

EEE

1

.

EEE

2

.

FFF

1

18

GGG

1

31

HHH

1

.

HHH

2

.

HHH

3

XY

 

At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:

 

 

data want;
     do _n_=1 by 1 until(last.id);
	   set have;
       by id;
       if (data ne .) then _count=sum(_count,1);
	   else _count=0;
	   if _count>=2 then _flag=1;
	   end;
	 do _n_=1 to _n_;
	   set have;
	   if _flag then output;
	   end;
	drop _:;
run;

 

 

In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below: 

 

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

DDD

1

87

DDD

2

.

DDD

3

13

 

Any help would be much appreciated! 

 

Thanks


 

View solution in original post

5 REPLIES 5
Reeza
Super User

I think you can just use the HAVING clause in PROC SQL. Does the following work for you? If not, please post the log.

 

proc sql;

create table want as 

select *

from have

group by ID

having count(data)>2;

quit;

@asgee wrote:

Hi all,

 

I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

CCC

1

17

DDD

1

87

DDD

2

.

DDD

3

13

EEE

1

.

EEE

2

.

FFF

1

18

GGG

1

31

HHH

1

.

HHH

2

.

HHH

3

XY

 

At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:

 

 

data want;
     do _n_=1 by 1 until(last.id);
	   set have;
       by id;
       if (data ne .) then _count=sum(_count,1);
	   else _count=0;
	   if _count>=2 then _flag=1;
	   end;
	 do _n_=1 to _n_;
	   set have;
	   if _flag then output;
	   end;
	drop _:;
run;

 

 

In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below: 

 

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

DDD

1

87

DDD

2

.

DDD

3

13

 

Any help would be much appreciated! 

 

Thanks


 

asgee
Obsidian | Level 7
Hi @Reeza ! Thanks for the reply. Yes it seems like your code works perfectly! I never came across the "HAVING" clause so that's really good to know. Could there also be a "NOT HAVING" clause if I wanted to subset the opposite?
Reeza
Super User

@asgee wrote:
Hi @Reeza ! Thanks for the reply. Yes it seems like your code works perfectly! I never came across the "HAVING" clause so that's really good to know. Could there also be a "NOT HAVING" clause if I wanted to subset the opposite?

Try it and let us know 🙂

novinosrin
Tourmaline | Level 20

data have;
input ID $	Visit	Data $;
cards;
AAA	1	131
AAA	2	5
AAA	3	.
BBB	1	12
BBB	2	61
CCC	1	17
DDD	1	87
DDD	2	.
DDD	3	13
EEE	1	.
EEE	2	.
FFF	1	18
GGG	1	31
HHH	1	.
HHH	2	.
HHH	3	XY
;

proc sql;
 create table want as
 select *
 from have
 group by id
 having n(data)>=2
 order by id,visit;
quit;

/*or*/
data want;
 do _n_=1 by 1 until(last.id);
  set have;
  by id;
  _n=sum(^missing(data),_n);
 end;
 do _n_=1 to _n_;
  set have;
  if _n>=2 then output;
 end;
 drop _:;
run;
asgee
Obsidian | Level 7
Hi @novinosrin ! Thanks for your reply. I tried your code and it works as well! Will keep your solution in mind for other issues I have with repeated measures. Thanks again!

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1191 views
  • 3 likes
  • 3 in conversation