Hi Forum,
In the attached dataset, I need to see if the “Balance” of duplicate accounts are identical or not.
E.g. Account_number 111 has two records. “Balance” of each record are 25. So, they are identical.
My approach.
I used diff function (from Cody’s book).
data want;
SET tt;
BY account_number;
*REMOVE accounts WITH ONE record ;
IF FIRST.account_number AND LAST.account_number
THEN DELETE;
RETAIN R_HR ;
IF FIRST.account_number THEN DO;
R_HR = balance;
END;
IF LAST.account_number THEN DO;
DIFF_HR = balance - R_HR;
OUTPUT;
END;
DROP R_: ;
RUN;
proc freq data = want;
tables DIFF_HR;
run;
Question:
I want to determine if the values of “arrears_band” variable are also identical or not in duplicate accounts.
I do not know how to use diff function for “character” variable.
Could any one of you help me?
Thanks
Mirisage
If you don't mind an SQL answer :
proc sql;
select account_number from sasforum.Post_this
group by account_number
having count(distinct Arrears_band) > 1;
quit;
PG
If you don't mind an SQL answer :
proc sql;
select account_number from sasforum.Post_this
group by account_number
having count(distinct Arrears_band) > 1;
quit;
PG
You said you used the dif function, but that isn't shown in your code. Anyhow, to answer your question, the dif function is limited to numeric variables. You could accomplish the same thing with the lag function. e.g.:
data want (keep=account_number identical);
SET tt;
BY account_number;
retain identical;
last_arrears_band=lag(arrears_band);
if not(FIRST.account_number and LAST.account_number);
IF FIRST.account_number then identical=1;
else if arrears_band ne last_arrears_band then identical=0;
IF last.account_number THEN output;
RUN;
If you only care about first or last of each group like your original code shows, then a small tweak will be sufficient do what you want for character variable:
data want;
SET tt;
BY account_number;
*REMOVE accounts WITH ONE record ;
IF FIRST.account_number AND LAST.account_number
THEN DELETE;
RETAIN r_ab ;
IF FIRST.account_number then r_ab = Arrears_Band;
IF LAST.account_number THEN DO;
DIFF_AB = (Arrears_Band = r_ab);
OUTPUT;
END;
DROP R_:;
RUN;
Basically, beside changing variable names, the only change need to be made is to replace the minus sign '-' with equal sign '='.
Haikuo
Hi PGStats, Art and Hai.Kuo,
Many thanks for every one of you. All three codes are working well.
(I cannot click "correct answer" for all three responses for some reason)
Hi Art,
Now only I realized that what I have used is not diff function.
Thanks again!
Mirisage
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.