Obsidian | Level 7

## Keep rows of observations based on complex criteria

Hi - I am new to SAS (coming over from STATA) and trying to perform 2 things on the data below:

1. I want to display a sum of the variable value for only claim_id 3

2. I only care about proc_cd = 23, so I would like to subset the data based on that BUT if a claim_id has proc_cd = 23 at some point, I want to keep all the other observations for that claim_id (ie claim_id = 3 has proc_cd =23 in the first row but I also want to keep the second row as well even though proc_cd=J20 since at some point claim_id = 3 had a proc_cd=23)

I copied this from the log using an individuals code that said it would generate code that others can use but I also copied my actual code below as well.

data WORK.SAMPLE1;
infile datalines dsd truncover;
input claim_id:32. proc_cd:\$8. value:32.;
datalines;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;;;;

My code I used to generate the dataset

data sample1;
input claim_id proc_cd \$ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;

Thanks for the help!

Cheers,

Peter

1 ACCEPTED SOLUTION

Accepted Solutions
Quartz | Level 8

## Re: Keep rows of observations based on complex criteria

OK this should work in all cases:

``````data sample1;
input claim_id proc_cd \$ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;

proc sort data=sample1 out=sample1sort; by claim_id proc_cd;
run;

data three;
set sample1sort;
by claim_id;
retain
flag1
flag2
sum_value
0;
if first.claim_id then do;
flag1 = 0; flag2=0;
sum_value = 0;
end;
if proc_cd = 23 then flag1 + 1;
if proc_cd = 35 then flag2 + 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
run;

proc sort data=three out=threesortout; by claim_id descending proc_cd; run;

data four
(drop=flag1 flag2 flag3 flag4)
;
set threesortout;
by claim_id;
retain
flag3
flag4
sum_value
0;
if first.claim_id then do;
flag3 = 0; flag4 = 0;
sum_value = 0;
end;
if proc_cd = 23 then flag3 = 1;
if proc_cd = 35 then flag4 = 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
if flag1=1|flag2=1|flag3=1|flag4=1
then output;
run;

proc print data=four; run;

``````
9 REPLIES 9
Quartz | Level 8

## Re: Keep rows of observations based on complex criteria

I might need to see what your "want" dataset looks like - does that need to also contain all rows for claim_id = 3 along with the sum value duplicated on each row? Below is what I put together using proc sql;

``````proc sql;
create table want as
select claim_id, sum(value) as total_value
from sample1
where claim_id in (select distinct claim_id from sample1 where proc_cd = '23')
and claim_id = 3
group by claim_id;
quit;``````
Obsidian | Level 7

## Re: Keep rows of observations based on complex criteria

Thanks for such a quick reply!

My apologies - the sum value part is confusing so we can just forget about that, I would just want to create a new dataset based on the following explanation and then I'll show my want dataset.

1. I only care about proc_cd = 23 AND proc_cd = 35 (sorry I forgot it's 2 criteria), so I would like to subset the data based on that BUT if a claim_id has proc_cd = 23 or proc_cd=35 at some point, I want to keep all the other observations for that claim_id (ie claim_id = 3 has proc_cd =23 in the first row but I also want to keep the second row as well even though proc_cd=J20 since at some point claim_id = 3 had a proc_cd=23).

STARTING DATA
data sample1;
input claim_id proc_cd \$ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;

WANT DATA
data sample1;
input claim_id proc_cd \$ value;
cards;
3 23 34
3 J20 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;

In the want data - the observations with claim_id = 2 are gone because it never had a proc_cd = 23 or proc_cd = 35.

Thanks again for the help!
Best,
Peter
Quartz | Level 8

## Re: Keep rows of observations based on complex criteria

``````proc sql;
create table want as
select *
from sample1 where claim_id in (select distinct claim_id from sample1 where proc_cd in ('23','35'))
;
quit;``````
Quartz | Level 8

## Re: Keep rows of observations based on complex criteria

your solution is elegant -- well done!
Obsidian | Level 7

## Re: Keep rows of observations based on complex criteria

Thanks so much - this works awesome!
Quartz | Level 8

## Re: Keep rows of observations based on complex criteria

Here's an alternative using by group processing, which, if you are new to SAS, I'd learn it since it comes in handy for stuff like this. We have to sort it by your group first. Then retain your sum value variable and a temporary variable I'm using to indicate whether it found proc_cd = 23 in that group. I use that to control whether it outputs that group to the output dataset.

``````proc sort data=sample1;
by claim_id;
run;
data want (drop=has_proc_cd_23);
set sample1;
by claim_id;
retain sum_value has_proc_cd_23;
sum_value+value;
if proc_cd = '23' then has_proc_cd_23=1;
if last.claim_id then do;
if has_proc_cd_23=1 then output;
sum_value=.;
has_proc_cd_23=.;
end;
run;``````
Quartz | Level 8

## Re: Keep rows of observations based on complex criteria

``````/*1. I want to display a sum of the
variable value for only claim_id 3*/

proc print data=sample1;
where claim_id = 3;
id claim_id;
sum value;
run;

/*2. I only care about proc_cd = 23, */
/*so I would like to subset the data */
/*based on that BUT if a claim_id has */
/*proc_cd = 23 at some point, I want to */
/*keep all the other observations for */
/*that claim_id (ie claim_id = 3 has */
/*proc_cd =23 in the first row but I */
/*also want to keep the second row as */
/*well even though proc_cd=J20 since */
/*at some point claim_id = 3 had a */
/*proc_cd=23)*/
/**/

proc sort data=sample1 out=sample1sort; by claim_id;
run;

data two (drop=flag);
set sample1sort;
by claim_id;
retain flag 0;
if first.claim_id then do;
flag = 0;
end;
if proc_cd = 23 then flag + 1;
if flag=1 then output;
run;

proc print data=two;
run;``````
Quartz | Level 8

## Re: Keep rows of observations based on complex criteria

OK this should work in all cases:

``````data sample1;
input claim_id proc_cd \$ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;

proc sort data=sample1 out=sample1sort; by claim_id proc_cd;
run;

data three;
set sample1sort;
by claim_id;
retain
flag1
flag2
sum_value
0;
if first.claim_id then do;
flag1 = 0; flag2=0;
sum_value = 0;
end;
if proc_cd = 23 then flag1 + 1;
if proc_cd = 35 then flag2 + 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
run;

proc sort data=three out=threesortout; by claim_id descending proc_cd; run;

data four
(drop=flag1 flag2 flag3 flag4)
;
set threesortout;
by claim_id;
retain
flag3
flag4
sum_value
0;
if first.claim_id then do;
flag3 = 0; flag4 = 0;
sum_value = 0;
end;
if proc_cd = 23 then flag3 = 1;
if proc_cd = 35 then flag4 = 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
if flag1=1|flag2=1|flag3=1|flag4=1
then output;
run;

proc print data=four; run;

``````
Obsidian | Level 7

## Re: Keep rows of observations based on complex criteria

Thanks so much - this works awesome too!
Discussion stats
• 9 replies
• 977 views
• 2 likes
• 3 in conversation