Re: Case when statement with more than 2 variables

Sandeep77 · Posted 10-20-2022 11:59 AM

Hi, I have a dataset which has debt_code, icustomerid and datekey. I am using a case when statement using these three columns. I want to find if debt_code, icustomerid and datekey are same then 0 and if debt_code, icustomerid are same and datekey is different then 1 but I am not sure of using the multiple case. Can you please assist?

proc sql;
create table Multiple_entry as
select *,
case when debt_code = icustomerid = Datekey then 0 
else 1 end as Flag
from Previous_Status;
quit;
Error log:
29         proc sql;
30         create table Multiple_entry as
31         select *,
32         case when debt_code = icustomerid = Datekey then 0
33         else 1 end as Flag
34         from Previous_Status;
ERROR: Expression using equals (=) has components that are of different data types.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
35         quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      memory              5589.87k
      OS Memory           32348.00k
      Timestamp           10/20/2022 04:54:56 PM
      Step Count                        10  Switch Count  0

Sample dataset in Previous_status table:
rep_code	debt_code	icustomerid	accountstatusprevious	datekey
133	286338454	16418	133	20220406
133	295191266	16418	133	20220406
133	299889733	16418	133	20220406
133	266208776	21436	133	20190207
133	266208776	21436	133	20190911
133	266208776	21436	133	20200109
133	266208776	21436	133	20201207
133	291458610	21436	133	20190911
133	291458610	21436	133	20190917
133	291458610	21436	133	20200109
133	291458610	21436	133	20201207

I want to flag the 3 rows as 0 because the date is same and 4-7 rows as 1 as the debt code and icustomerid is same but date is different.

Reeza · Posted 10-20-2022 12:01 PM

= tests within the same row.
You are testing between rows.
I think you're more looking for a GROUP BY statement as well as CASE in this situation.

proc sql;
create table want as 
select *, 
    case when max(datekey)=min(datekey) then 0 
              else 1 end as flag
from have 
group by icustomerid, debt_code;
quit;

Sandeep77 · Posted 10-20-2022 12:26 PM

Thank you Reeza, It did gave me the 0 and 1 flag but what if I want 1 row per datekey for same icustomerid? By doing this I will be able to remove the duplicate records. For e.g., the first 3 rows should show 1 record as the icustomerid and datekey is same. Row 4 to 7 seems to be good as the icustomerid is same but it shows the datekey different. Can you suggest?

Below is the sample date after adding the flag.

rep_code	debt_code	icustomerid	accountstatusprevious	datekey	flag
133	286338454	16418	133	20220406	0
133	295191266	16418	133	20220406	0
133	299889733	16418	133	20220406	0
133	266208776	21436	133	20190207	1
133	266208776	21436	133	20190911	1
133	266208776	21436	133	20200109	1
133	266208776	21436	133	20201207	1

Reeza · Posted 10-20-2022 12:33 PM

Easy enough but not the question asked initially. Proc sort will keep only unique values based on a set of key variables when you specify the UNIQUEKEY option. It will keep the rest of the variables for that record as well, and this is easier to manage in PROC SORT than SQL.

proc sort data=have out=want uniquekey;
by icustomerID debt_code datekey;
run;

@Sandeep77 wrote:

Thank you Reeza, It did gave me the 0 and 1 flag but what if I want 1 row per datekey for same icustomerid? By doing this I will be able to remove the duplicate records. For e.g., the first 3 rows should show 1 record as the icustomerid and datekey is same. Row 4 to 7 seems to be good as the icustomerid is same but it shows the datekey different. Can you suggest?
Below is the sample date after adding the flag.

rep_code	debt_code	icustomerid	accountstatusprevious	datekey	flag
133	286338454	16418	133	20220406	0
133	295191266	16418	133	20220406	0
133	299889733	16418	133	20220406	0
133	266208776	21436	133	20190207	1
133	266208776	21436	133	20190911	1
133	266208776	21436	133	20200109	1
133	266208776	21436	133	20201207	1

Kurt_Bremser · Posted 10-21-2022 06:40 AM

Have you verified that all three variables used in the comparison are of type character?

Edit:

from your example data, this condition can never be true. What you want is a comparison across multiple observations, not a comparison within a single observation.

proc sort data=have;
by debt_code icustomerid datekey;
run;

data want;
set have;
by debt_code icustomerid datekey;
flag = (first.datekey = last.datekey);
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Case when statement with more than 2 variables