Hi Forum,
In the attached dataset, I need to see if the “Balance” of duplicate accounts are identical or not.
E.g. Account_number 111 has two records. “Balance” of each record are 25. So, they are identical.
My approach.
I used diff function (from Cody’s book).
data want;
SET tt;
BY account_number;
*REMOVE accounts WITH ONE record ;
IF FIRST.account_number AND LAST.account_number
THEN DELETE;
RETAIN R_HR ;
IF FIRST.account_number THEN DO;
R_HR = balance;
END;
IF LAST.account_number THEN DO;
DIFF_HR = balance - R_HR;
OUTPUT;
END;
DROP R_: ;
RUN;
proc freq data = want;
tables DIFF_HR;
run;
Question:
I want to determine if the values of “arrears_band” variable are also identical or not in duplicate accounts.
I do not know how to use diff function for “character” variable.
Could any one of you help me?
Thanks
Mirisage
If you don't mind an SQL answer :
proc sql;
select account_number from sasforum.Post_this
group by account_number
having count(distinct Arrears_band) > 1;
quit;
PG
If you don't mind an SQL answer :
proc sql;
select account_number from sasforum.Post_this
group by account_number
having count(distinct Arrears_band) > 1;
quit;
PG
You said you used the dif function, but that isn't shown in your code. Anyhow, to answer your question, the dif function is limited to numeric variables. You could accomplish the same thing with the lag function. e.g.:
data want (keep=account_number identical);
SET tt;
BY account_number;
retain identical;
last_arrears_band=lag(arrears_band);
if not(FIRST.account_number and LAST.account_number);
IF FIRST.account_number then identical=1;
else if arrears_band ne last_arrears_band then identical=0;
IF last.account_number THEN output;
RUN;
If you only care about first or last of each group like your original code shows, then a small tweak will be sufficient do what you want for character variable:
data want;
SET tt;
BY account_number;
*REMOVE accounts WITH ONE record ;
IF FIRST.account_number AND LAST.account_number
THEN DELETE;
RETAIN r_ab ;
IF FIRST.account_number then r_ab = Arrears_Band;
IF LAST.account_number THEN DO;
DIFF_AB = (Arrears_Band = r_ab);
OUTPUT;
END;
DROP R_:;
RUN;
Basically, beside changing variable names, the only change need to be made is to replace the minus sign '-' with equal sign '='.
Haikuo
Hi PGStats, Art and Hai.Kuo,
Many thanks for every one of you. All three codes are working well.
(I cannot click "correct answer" for all three responses for some reason)
Hi Art,
Now only I realized that what I have used is not diff function.
Thanks again!
Mirisage
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.