Solved: Retain flag for claim id based on dialysis variable

JH74 · Posted 09-23-2023 06:40 PM

Hi. I have a data set that contains the fields claim_number claim_line_number, and dialysis. I need to create a new varible named remove_clm and this variable should be populated with a "Y" for each claim line for the claim number, if ANY line has dialysis = "Y".

Data Example:

Claim_Number	Claim_Line_Number	Dialysis
1234567	1	Y
1234567	2	Y
1234567	3	N
1234567	4	N
7654321	1	N
7654321	2	N
7654321	3	N
7654321	4	N
7654321	5	N
7654321	6	Y
2468100	1	Y
2468100	2	Y
2468100	3	Y
3456789	1	Y
3456789	2	N
3456789	3	Y
3456789	4	Y
3456789	5	N

I have tried this code:

proc sort data=save.dataset out=dataset_srt;

by claim_number dialysis;

run;

data test;

set dataset_srt;

by claim_number dialysis;

retain remove_claim;

if first.claim_number then remove_claim='N';

if dialysis = "Y" then remove_claim = 'Y';

run;

But, the above code is not giving me a "Y" for each line of the claim, only the line where Dialysis = "Y". Like this:

Claim_Number	Claim_Line_Number	Dialysis	Remove_Claim
1234567	1	Y	Y
1234567	2	Y	Y
1234567	3	N	N
1234567	4	N	N
7654321	1	N	N
7654321	2	N	N
7654321	3	N	N
7654321	4	N	N
7654321	5	N	N
7654321	6	Y	Y
2468100	1	Y	Y
2468100	2	Y	Y
2468100	3	Y	Y
3456789	1	Y	Y
3456789	2	N	N
3456789	3	Y	Y
3456789	4	Y	Y
3456789	5	N	N

The output I am trying to get is this:

Claim_Number	Claim_Line_Number	Dialysis	Remove_Claim
1234567	1	Y	Y
1234567	2	Y	Y
1234567	3	N	Y
1234567	4	N	Y
7654321	1	N	Y
7654321	2	N	Y
7654321	3	N	Y
7654321	4	N	Y
7654321	5	N	Y
7654321	6	Y	Y
2468100	1	Y	Y
2468100	2	Y	Y
2468100	3	Y	Y
3456789	1	Y	Y
3456789	2	N	Y
3456789	3	Y	Y
3456789	4	Y	Y
3456789	5	N	Y

I appreciate any help. Thanks.

Ksharp · Posted 09-24-2023 05:22 AM

data have;
infile cards expandtabs;
input Claim_Number	Claim_Line_Number	Dialysis $;
cards;
1234567	1	Y
1234567	2	Y
1234567	3	N
1234567	4	N
7654321	1	N
7654321	2	N
7654321	3	N
7654321	4	N
7654321	5	N
7654321	6	Y
2468100	1	Y
2468100	2	Y
2468100	3	Y
3456789	1	Y
3456789	2	N
3456789	3	Y
3456789	4	Y
3456789	5	N
;
proc sql;
create table want as
select *,max(Dialysis) as Remove_Claim from have group by Claim_Number;
quit;

View solution in original post

mkeintz · Posted 09-23-2023 11:21 PM

If the data is already sorted by claim_number, there is no need to sort it further.

You can, for each claim number do a self-merge, of all the lines with dialysis='Y', merge with all lines for the claim number, as below (untested, in the absence of a data sample in the form of a working data step:

data want;
  merge save_dataset (where=(dialysis='Y') in=found_a_yes)
        save_dataset ;
  by claim_number;
  remove_claim=ifc(found_a_yes=1,'Y','N');
run;

If there is even a single observations with dialysis='Y' input by the merge statement, then the dummy variable found_a_yes will equal one for all merged observations for that claim_number. And since the subset of dialysis=1 is listed first in the merge statement, its values will be overwritten by the values taken from the second target of the merge statement, i.e. the complete list of observations, in original order.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Kurt_Bremser · Posted 09-24-2023 03:04 AM

data want;
merge
  have
  have (
    keep=claim_number dialysis
    rename=(dialysis=remove_claim)
    where=(remove_claim = "Y")
  )
;
by claim_number;
run;

Since dialysis already has the same content as your wanted variable, you can use it directly.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 09-24-2023 05:22 AM

data have;
infile cards expandtabs;
input Claim_Number	Claim_Line_Number	Dialysis $;
cards;
1234567	1	Y
1234567	2	Y
1234567	3	N
1234567	4	N
7654321	1	N
7654321	2	N
7654321	3	N
7654321	4	N
7654321	5	N
7654321	6	Y
2468100	1	Y
2468100	2	Y
2468100	3	Y
3456789	1	Y
3456789	2	N
3456789	3	Y
3456789	4	Y
3456789	5	N
;
proc sql;
create table want as
select *,max(Dialysis) as Remove_Claim from have group by Claim_Number;
quit;

Retain flag for claim id based on dialysis variable

Re: Retain flag for claim id based on dialysis variable

Re: Retain flag for claim id based on dialysis variable

Re: Retain flag for claim id based on dialysis variable

Re: Retain flag for claim id based on dialysis variable

Retain flag for claim id based on dialysis variable

Re: Retain flag for claim id based on dialysis variable

Re: Retain flag for claim id based on dialysis variable

Re: Retain flag for claim id based on dialysis variable

Re: Retain flag for claim id based on dialysis variable

Registration is open

SAS Training: Just a Click Away