Hi - I am new to SAS (coming over from STATA) and trying to perform 2 things on the data below:
1. I want to display a sum of the variable value for only claim_id 3
2. I only care about proc_cd = 23, so I would like to subset the data based on that BUT if a claim_id has proc_cd = 23 at some point, I want to keep all the other observations for that claim_id (ie claim_id = 3 has proc_cd =23 in the first row but I also want to keep the second row as well even though proc_cd=J20 since at some point claim_id = 3 had a proc_cd=23)
I copied this from the log using an individuals code that said it would generate code that others can use but I also copied my actual code below as well.
data WORK.SAMPLE1;
infile datalines dsd truncover;
input claim_id:32. proc_cd:$8. value:32.;
datalines;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;;;;
My code I used to generate the dataset
data sample1;
input claim_id proc_cd $ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;
Thanks for the help!
Cheers,
Peter
OK this should work in all cases:
data sample1;
input claim_id proc_cd $ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;
proc sort data=sample1 out=sample1sort; by claim_id proc_cd;
run;
data three;
set sample1sort;
by claim_id;
retain
flag1
flag2
sum_value
0;
if first.claim_id then do;
flag1 = 0; flag2=0;
sum_value = 0;
end;
if proc_cd = 23 then flag1 + 1;
if proc_cd = 35 then flag2 + 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
run;
proc sort data=three out=threesortout; by claim_id descending proc_cd; run;
data four
(drop=flag1 flag2 flag3 flag4)
;
set threesortout;
by claim_id;
retain
flag3
flag4
sum_value
0;
if first.claim_id then do;
flag3 = 0; flag4 = 0;
sum_value = 0;
end;
if proc_cd = 23 then flag3 = 1;
if proc_cd = 35 then flag4 = 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
if flag1=1|flag2=1|flag3=1|flag4=1
then output;
run;
proc print data=four; run;
I might need to see what your "want" dataset looks like - does that need to also contain all rows for claim_id = 3 along with the sum value duplicated on each row? Below is what I put together using proc sql;
proc sql;
create table want as
select claim_id, sum(value) as total_value
from sample1
where claim_id in (select distinct claim_id from sample1 where proc_cd = '23')
and claim_id = 3
group by claim_id;
quit;
proc sql;
create table want as
select *
from sample1 where claim_id in (select distinct claim_id from sample1 where proc_cd in ('23','35'))
;
quit;
Here's an alternative using by group processing, which, if you are new to SAS, I'd learn it since it comes in handy for stuff like this. We have to sort it by your group first. Then retain your sum value variable and a temporary variable I'm using to indicate whether it found proc_cd = 23 in that group. I use that to control whether it outputs that group to the output dataset.
proc sort data=sample1;
by claim_id;
run;
data want (drop=has_proc_cd_23);
set sample1;
by claim_id;
retain sum_value has_proc_cd_23;
sum_value+value;
if proc_cd = '23' then has_proc_cd_23=1;
if last.claim_id then do;
if has_proc_cd_23=1 then output;
sum_value=.;
has_proc_cd_23=.;
end;
run;
/*1. I want to display a sum of the
variable value for only claim_id 3*/
proc print data=sample1;
where claim_id = 3;
id claim_id;
sum value;
run;
/*2. I only care about proc_cd = 23, */
/*so I would like to subset the data */
/*based on that BUT if a claim_id has */
/*proc_cd = 23 at some point, I want to */
/*keep all the other observations for */
/*that claim_id (ie claim_id = 3 has */
/*proc_cd =23 in the first row but I */
/*also want to keep the second row as */
/*well even though proc_cd=J20 since */
/*at some point claim_id = 3 had a */
/*proc_cd=23)*/
/**/
proc sort data=sample1 out=sample1sort; by claim_id;
run;
data two (drop=flag);
set sample1sort;
by claim_id;
retain flag 0;
if first.claim_id then do;
flag = 0;
end;
if proc_cd = 23 then flag + 1;
if flag=1 then output;
run;
proc print data=two;
run;
OK this should work in all cases:
data sample1;
input claim_id proc_cd $ value;
cards;
3 23 34
3 J20 34
2 123 34
2 12314 34
1 23 64
12 23 87
4 34 98
4 35 34
;
run;
proc sort data=sample1 out=sample1sort; by claim_id proc_cd;
run;
data three;
set sample1sort;
by claim_id;
retain
flag1
flag2
sum_value
0;
if first.claim_id then do;
flag1 = 0; flag2=0;
sum_value = 0;
end;
if proc_cd = 23 then flag1 + 1;
if proc_cd = 35 then flag2 + 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
run;
proc sort data=three out=threesortout; by claim_id descending proc_cd; run;
data four
(drop=flag1 flag2 flag3 flag4)
;
set threesortout;
by claim_id;
retain
flag3
flag4
sum_value
0;
if first.claim_id then do;
flag3 = 0; flag4 = 0;
sum_value = 0;
end;
if proc_cd = 23 then flag3 = 1;
if proc_cd = 35 then flag4 = 1;
if claim_id = 3 then sum_value + value; else
sum_value = .;
if flag1=1|flag2=1|flag3=1|flag4=1
then output;
run;
proc print data=four; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.