Hi. I have a data set that contains the fields claim_number claim_line_number, and dialysis. I need to create a new varible named remove_clm and this variable should be populated with a "Y" for each claim line for the claim number, if ANY line has dialysis = "Y".
Data Example:
Claim_Number | Claim_Line_Number | Dialysis |
1234567 | 1 | Y |
1234567 | 2 | Y |
1234567 | 3 | N |
1234567 | 4 | N |
7654321 | 1 | N |
7654321 | 2 | N |
7654321 | 3 | N |
7654321 | 4 | N |
7654321 | 5 | N |
7654321 | 6 | Y |
2468100 | 1 | Y |
2468100 | 2 | Y |
2468100 | 3 | Y |
3456789 | 1 | Y |
3456789 | 2 | N |
3456789 | 3 | Y |
3456789 | 4 | Y |
3456789 | 5 | N |
I have tried this code:
Claim_Number | Claim_Line_Number | Dialysis | Remove_Claim |
1234567 | 1 | Y | Y |
1234567 | 2 | Y | Y |
1234567 | 3 | N | N |
1234567 | 4 | N | N |
7654321 | 1 | N | N |
7654321 | 2 | N | N |
7654321 | 3 | N | N |
7654321 | 4 | N | N |
7654321 | 5 | N | N |
7654321 | 6 | Y | Y |
2468100 | 1 | Y | Y |
2468100 | 2 | Y | Y |
2468100 | 3 | Y | Y |
3456789 | 1 | Y | Y |
3456789 | 2 | N | N |
3456789 | 3 | Y | Y |
3456789 | 4 | Y | Y |
3456789 | 5 | N | N |
The output I am trying to get is this:
Claim_Number | Claim_Line_Number | Dialysis | Remove_Claim |
1234567 | 1 | Y | Y |
1234567 | 2 | Y | Y |
1234567 | 3 | N | Y |
1234567 | 4 | N | Y |
7654321 | 1 | N | Y |
7654321 | 2 | N | Y |
7654321 | 3 | N | Y |
7654321 | 4 | N | Y |
7654321 | 5 | N | Y |
7654321 | 6 | Y | Y |
2468100 | 1 | Y | Y |
2468100 | 2 | Y | Y |
2468100 | 3 | Y | Y |
3456789 | 1 | Y | Y |
3456789 | 2 | N | Y |
3456789 | 3 | Y | Y |
3456789 | 4 | Y | Y |
3456789 | 5 | N | Y |
I appreciate any help. Thanks.
data have;
infile cards expandtabs;
input Claim_Number Claim_Line_Number Dialysis $;
cards;
1234567 1 Y
1234567 2 Y
1234567 3 N
1234567 4 N
7654321 1 N
7654321 2 N
7654321 3 N
7654321 4 N
7654321 5 N
7654321 6 Y
2468100 1 Y
2468100 2 Y
2468100 3 Y
3456789 1 Y
3456789 2 N
3456789 3 Y
3456789 4 Y
3456789 5 N
;
proc sql;
create table want as
select *,max(Dialysis) as Remove_Claim from have group by Claim_Number;
quit;
If the data is already sorted by claim_number, there is no need to sort it further.
You can, for each claim number do a self-merge, of all the lines with dialysis='Y', merge with all lines for the claim number, as below (untested, in the absence of a data sample in the form of a working data step:
data want;
merge save_dataset (where=(dialysis='Y') in=found_a_yes)
save_dataset ;
by claim_number;
remove_claim=ifc(found_a_yes=1,'Y','N');
run;
If there is even a single observations with dialysis='Y' input by the merge statement, then the dummy variable found_a_yes will equal one for all merged observations for that claim_number. And since the subset of dialysis=1 is listed first in the merge statement, its values will be overwritten by the values taken from the second target of the merge statement, i.e. the complete list of observations, in original order.
data want;
merge
have
have (
keep=claim_number dialysis
rename=(dialysis=remove_claim)
where=(remove_claim = "Y")
)
;
by claim_number;
run;
Since dialysis already has the same content as your wanted variable, you can use it directly.
data have;
infile cards expandtabs;
input Claim_Number Claim_Line_Number Dialysis $;
cards;
1234567 1 Y
1234567 2 Y
1234567 3 N
1234567 4 N
7654321 1 N
7654321 2 N
7654321 3 N
7654321 4 N
7654321 5 N
7654321 6 Y
2468100 1 Y
2468100 2 Y
2468100 3 Y
3456789 1 Y
3456789 2 N
3456789 3 Y
3456789 4 Y
3456789 5 N
;
proc sql;
create table want as
select *,max(Dialysis) as Remove_Claim from have group by Claim_Number;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.