BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PeterBr
Obsidian | Level 7

Hi - I am new to SAS (coming over from STATA) and trying to perform 2 things on the data below:

 

1. I want to display a sum of the variable value for only claim_id 3

2. I only care about proc_cd = 23, so I would like to subset the data based on that BUT if a claim_id has proc_cd = 23 at some point, I want to keep all the other observations for that claim_id (ie claim_id = 3 has proc_cd =23 in the first row but I also want to keep the second row as well even though proc_cd=J20 since at some point claim_id = 3 had a proc_cd=23)

 

I copied this from the log using an individuals code that said it would generate code that others can use but I also copied my actual code below as well.

data WORK.SAMPLE1;
infile datalines dsd truncover;
input claim_id:32. proc_cd:$8. value:32.;
datalines;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;;;;

 

My code I used to generate the dataset

data sample1;
input claim_id proc_cd $ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;

 

 

Thanks for the help!

 

Cheers,

Peter

1 ACCEPTED SOLUTION

Accepted Solutions
Varrelle
Quartz | Level 8

OK this should work in all cases:

 

data sample1;
input claim_id proc_cd $ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;


proc sort data=sample1 out=sample1sort; by claim_id proc_cd;
run;

data three;
set sample1sort;
by claim_id;
retain 
flag1
flag2
sum_value
0;
if first.claim_id then do;
flag1 = 0; flag2=0; 
sum_value = 0;
end;
if proc_cd = 23 then flag1 + 1; 
if proc_cd = 35 then flag2 + 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
run;

proc sort data=three out=threesortout; by claim_id descending proc_cd; run;

data four 
(drop=flag1 flag2 flag3 flag4)
;
set threesortout;
by claim_id;
retain 
flag3
flag4 
sum_value
0;
if first.claim_id then do;
flag3 = 0; flag4 = 0;
sum_value = 0;
end;
if proc_cd = 23 then flag3 = 1;
if proc_cd = 35 then flag4 = 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
if flag1=1|flag2=1|flag3=1|flag4=1 
then output;
run;

proc print data=four; run;

View solution in original post

9 REPLIES 9
bobpep212
Quartz | Level 8

I might need to see what your "want" dataset looks like - does that need to also contain all rows for claim_id = 3 along with the sum value duplicated on each row? Below is what I put together using proc sql;  

proc sql;
create table want as 
select claim_id, sum(value) as total_value 
from sample1 
where claim_id in (select distinct claim_id from sample1 where proc_cd = '23')
and claim_id = 3
group by claim_id;
quit;
PeterBr
Obsidian | Level 7
Thanks for such a quick reply!

My apologies - the sum value part is confusing so we can just forget about that, I would just want to create a new dataset based on the following explanation and then I'll show my want dataset.

1. I only care about proc_cd = 23 AND proc_cd = 35 (sorry I forgot it's 2 criteria), so I would like to subset the data based on that BUT if a claim_id has proc_cd = 23 or proc_cd=35 at some point, I want to keep all the other observations for that claim_id (ie claim_id = 3 has proc_cd =23 in the first row but I also want to keep the second row as well even though proc_cd=J20 since at some point claim_id = 3 had a proc_cd=23).

STARTING DATA
data sample1;
input claim_id proc_cd $ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;

WANT DATA
data sample1;
input claim_id proc_cd $ value;
cards;
3 23 34
3 J20 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;

In the want data - the observations with claim_id = 2 are gone because it never had a proc_cd = 23 or proc_cd = 35.

Thanks again for the help!
Best,
Peter
bobpep212
Quartz | Level 8
proc sql;
create table want as 
select *
from sample1 where claim_id in (select distinct claim_id from sample1 where proc_cd in ('23','35'))
;
quit;
Varrelle
Quartz | Level 8
your solution is elegant -- well done!
PeterBr
Obsidian | Level 7
Thanks so much - this works awesome!
bobpep212
Quartz | Level 8

Here's an alternative using by group processing, which, if you are new to SAS, I'd learn it since it comes in handy for stuff like this. We have to sort it by your group first. Then retain your sum value variable and a temporary variable I'm using to indicate whether it found proc_cd = 23 in that group. I use that to control whether it outputs that group to the output dataset. 

proc sort data=sample1;
by claim_id;
run;
data want (drop=has_proc_cd_23);
set sample1;
by claim_id;
retain sum_value has_proc_cd_23;
sum_value+value;
if proc_cd = '23' then has_proc_cd_23=1;
if last.claim_id then do;
	if has_proc_cd_23=1 then output;
	sum_value=.;
	has_proc_cd_23=.;
						end;
run;
Varrelle
Quartz | Level 8
/*1. I want to display a sum of the
variable value for only claim_id 3*/

proc print data=sample1;
where claim_id = 3;
id claim_id;
sum value;
run;


/*2. I only care about proc_cd = 23, */
/*so I would like to subset the data */
/*based on that BUT if a claim_id has */
/*proc_cd = 23 at some point, I want to */
/*keep all the other observations for */
/*that claim_id (ie claim_id = 3 has */
/*proc_cd =23 in the first row but I */
/*also want to keep the second row as */
/*well even though proc_cd=J20 since */
/*at some point claim_id = 3 had a */
/*proc_cd=23)*/
/**/

proc sort data=sample1 out=sample1sort; by claim_id;
run;

data two (drop=flag);
set sample1sort;
by claim_id;
retain flag 0;
if first.claim_id then do;
flag = 0;
end;
if proc_cd = 23 then flag + 1;
if flag=1 then output;
run;

proc print data=two;
run;
Varrelle
Quartz | Level 8

OK this should work in all cases:

 

data sample1;
input claim_id proc_cd $ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;


proc sort data=sample1 out=sample1sort; by claim_id proc_cd;
run;

data three;
set sample1sort;
by claim_id;
retain 
flag1
flag2
sum_value
0;
if first.claim_id then do;
flag1 = 0; flag2=0; 
sum_value = 0;
end;
if proc_cd = 23 then flag1 + 1; 
if proc_cd = 35 then flag2 + 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
run;

proc sort data=three out=threesortout; by claim_id descending proc_cd; run;

data four 
(drop=flag1 flag2 flag3 flag4)
;
set threesortout;
by claim_id;
retain 
flag3
flag4 
sum_value
0;
if first.claim_id then do;
flag3 = 0; flag4 = 0;
sum_value = 0;
end;
if proc_cd = 23 then flag3 = 1;
if proc_cd = 35 then flag4 = 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
if flag1=1|flag2=1|flag3=1|flag4=1 
then output;
run;

proc print data=four; run;

PeterBr
Obsidian | Level 7
Thanks so much - this works awesome too!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1875 views
  • 2 likes
  • 3 in conversation