SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
asgee
Obsidian | Level 7

Hi all,

 

I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

CCC

1

17

DDD

1

87

DDD

2

.

DDD

3

13

EEE

1

.

EEE

2

.

FFF

1

18

GGG

1

31

HHH

1

.

HHH

2

.

HHH

3

25

 

At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:

 

 

data want;
     do _n_=1 by 1 until(last.id);
	   set have;
       by id;
       if (data ne .) then _count=sum(_count,1);
	   else _count=0;
	   if _count>=2 then _flag=1;
	   end;
	 do _n_=1 to _n_;
	   set have;
	   if _flag then output;
	   end;
	drop _:;
run;

 

 

In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below: 

 

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

DDD

1

87

DDD

2

.

DDD

3

13

 

Any help would be much appreciated! 

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I think you can just use the HAVING clause in PROC SQL. Does the following work for you? If not, please post the log.

 

proc sql;

create table want as 

select *

from have

group by ID

having count(data)>2;

quit;

@asgee wrote:

Hi all,

 

I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

CCC

1

17

DDD

1

87

DDD

2

.

DDD

3

13

EEE

1

.

EEE

2

.

FFF

1

18

GGG

1

31

HHH

1

.

HHH

2

.

HHH

3

XY

 

At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:

 

 

data want;
     do _n_=1 by 1 until(last.id);
	   set have;
       by id;
       if (data ne .) then _count=sum(_count,1);
	   else _count=0;
	   if _count>=2 then _flag=1;
	   end;
	 do _n_=1 to _n_;
	   set have;
	   if _flag then output;
	   end;
	drop _:;
run;

 

 

In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below: 

 

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

DDD

1

87

DDD

2

.

DDD

3

13

 

Any help would be much appreciated! 

 

Thanks


 

View solution in original post

5 REPLIES 5
Reeza
Super User

I think you can just use the HAVING clause in PROC SQL. Does the following work for you? If not, please post the log.

 

proc sql;

create table want as 

select *

from have

group by ID

having count(data)>2;

quit;

@asgee wrote:

Hi all,

 

I'm working on this dataset that has repeated measures over time. I want to detect the IDs where they have at least 2 non missing observations in the variable "Data". A sample of the dataset is shown below:

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

CCC

1

17

DDD

1

87

DDD

2

.

DDD

3

13

EEE

1

.

EEE

2

.

FFF

1

18

GGG

1

31

HHH

1

.

HHH

2

.

HHH

3

XY

 

At the moment, I've modified a code that @novinosrin helped me out with from a previous question (thanks again!). However, when I ran this modified code, it seems to only detect (keep) the IDs where there is non-missing data right after each other (back-to-back). I believe it has something to do with the _flag step and have tried modifying that as well but I'm struggling to figure it out:

 

 

data want;
     do _n_=1 by 1 until(last.id);
	   set have;
       by id;
       if (data ne .) then _count=sum(_count,1);
	   else _count=0;
	   if _count>=2 then _flag=1;
	   end;
	 do _n_=1 to _n_;
	   set have;
	   if _flag then output;
	   end;
	drop _:;
run;

 

 

In this case, as long as there is at least 2 non-missing data in that ID (regardless of whether the non-missing data are back-to-back). Based on the sample table above, I'd want to produce something like the table below: 

 

 

ID

Visit

Data

AAA

1

131

AAA

2

5

AAA

3

.

BBB

1

12

BBB

2

61

DDD

1

87

DDD

2

.

DDD

3

13

 

Any help would be much appreciated! 

 

Thanks


 

asgee
Obsidian | Level 7
Hi @Reeza ! Thanks for the reply. Yes it seems like your code works perfectly! I never came across the "HAVING" clause so that's really good to know. Could there also be a "NOT HAVING" clause if I wanted to subset the opposite?
Reeza
Super User

@asgee wrote:
Hi @Reeza ! Thanks for the reply. Yes it seems like your code works perfectly! I never came across the "HAVING" clause so that's really good to know. Could there also be a "NOT HAVING" clause if I wanted to subset the opposite?

Try it and let us know 🙂

novinosrin
Tourmaline | Level 20

data have;
input ID $	Visit	Data $;
cards;
AAA	1	131
AAA	2	5
AAA	3	.
BBB	1	12
BBB	2	61
CCC	1	17
DDD	1	87
DDD	2	.
DDD	3	13
EEE	1	.
EEE	2	.
FFF	1	18
GGG	1	31
HHH	1	.
HHH	2	.
HHH	3	XY
;

proc sql;
 create table want as
 select *
 from have
 group by id
 having n(data)>=2
 order by id,visit;
quit;

/*or*/
data want;
 do _n_=1 by 1 until(last.id);
  set have;
  by id;
  _n=sum(^missing(data),_n);
 end;
 do _n_=1 to _n_;
  set have;
  if _n>=2 then output;
 end;
 drop _:;
run;
asgee
Obsidian | Level 7
Hi @novinosrin ! Thanks for your reply. I tried your code and it works as well! Will keep your solution in mind for other issues I have with repeated measures. Thanks again!

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1242 views
  • 3 likes
  • 3 in conversation