HI,
Need some help, i have a list of code and need to do some comparison between first and second and so on. My code as below:-
data have; infile cards ; input code $16.; cards; 10017290064CONE 10019220069CFCP 10019220069PONE 10022640063CLADY 10031300063CONE 10031420069CFCP 10031420069PGEN 10031420069PGEN 10031420069PONE 10031770067CONE 10034410067CONE 10034480060CONE 10044510062PGEN 10044510062PGEN 10046570064CONE 10046570064CONE 10049240061CONE 10051110061LADY 10051230067PGEN 10051230067PONE ; data want; SET have; IF FIRST.CODE THEN SGCHK = "DIFF"; ELSE IF lag(CODE) = CODE THEN SGCHK = "DUP"; ELSE SGCHK = "DIFF"; RUN;
and output as below:-
code | SGCHK |
10017290064CONE | DIFF |
10019220069CFCP | DIFF |
10019220069PONE | DIFF |
10022640063CLADY | DIFF |
10031300063CONE | DIFF |
10031420069CFCP | DIFF |
10031420069PGEN | DIFF |
10031420069PGEN | DUP |
10031420069PONE | DIFF |
10031770067CONE | DIFF |
10034410067CONE | DIFF |
10034480060CONE | DIFF |
10044510062PGEN | DIFF |
10044510062PGEN | DUP |
10046570064CONE | DIFF |
10046570064CONE | DUP |
10049240061CONE | DIFF |
10051110061LADY | DIFF |
10051230067PGEN | DIFF |
10051230067PONE | DIFF |
Is there a way to get output as below?
code | SGCHK |
10017290064CONE | DIFF |
10019220069CFCP | DIFF |
10019220069PONE | DIFF |
10022640063CLADY | DIFF |
10031300063CONE | DIFF |
10031420069CFCP | DIFF |
10031420069PGEN | DUP |
10031420069PGEN | DIFF |
10031420069PONE | DIFF |
10031770067CONE | DIFF |
10034410067CONE | DIFF |
10034480060CONE | DIFF |
10044510062PGEN | DUP |
10044510062PGEN | DIFF |
10046570064CONE | DUP |
10046570064CONE | DIFF |
10049240061CONE | DIFF |
10051110061LADY | DIFF |
10051230067PGEN | DIFF |
10051230067PONE | DIFF |
Thank you
Then you "simply" read dataset "have" two times side-by-side, but have one read shifted by one observation:
data want;
merge
have
have (firstobs=2 rename=(code=code2))
;
if code = code2
then sgchk = 'DUP';
else sgchk = 'DIF';
drop code2;
run;
Use the BY groups first/last values.
data want;
Set have;
By code;
If first.code and not last.code then SGCHK='DUP';
Else sgchk='DIFF';
Run;
Thank Reeza,
it seem not working if code appear more than 2 as per below:-
10034410067CONE - appear 3 times
10051230067PONE - appear 4 times
data have; infile cards ; input code $16.; cards; 10017290064CONE 10019220069CFCP 10019220069PONE 10022640063CLADY 10031300063CONE 10031420069CFCP 10031420069PGEN 10031420069PGEN 10031420069PONE 10031770067CONE 10034410067CONE 10034410067CONE
10034410067CONE 10034480060CONE 10044510062PGEN 10044510062PGEN 10046570064CONE 10046570064CONE 10049240061CONE 10051110061LADY 10051230067PGEN 10051230067PONE
10051230067PONE
10051230067PONE
10051230067PONE ; data want; Set have; By code; If first.code and not last.code then SGCHK='DUP'; Else sgchk='DIFF'; Run;
and output is
code | SGCHK |
10017290064CONE | DIF |
10019220069CFCP | DIF |
10019220069PONE | DIF |
10022640063CLADY | DIF |
10031300063CONE | DIF |
10031420069CFCP | DIF |
10031420069PGEN | DUP |
10031420069PGEN | DIF |
10031420069PONE | DIF |
10031770067CONE | DIF |
10034410067CONE | DUP |
10034410067CONE | DIF |
10034410067CONE | DIF |
10034480060CONE | DIF |
10044510062PGEN | DUP |
10044510062PGEN | DIF |
10046570064CONE | DUP |
10046570064CONE | DIF |
10049240061CONE | DIF |
10051110061LADY | DIF |
10051230067PGEN | DIF |
10051230067PONE | DUP |
10051230067PONE | DIF |
10051230067PONE | DIF |
10051230067PONE | DIF |
The output should be
code | SGCHK |
10017290064CONE | DIF |
10019220069CFCP | DIF |
10019220069PONE | DIF |
10022640063CLADY | DIF |
10031300063CONE | DIF |
10031420069CFCP | DIF |
10031420069PGEN | DUP |
10031420069PGEN | DIF |
10031420069PONE | DIF |
10031770067CONE | DIF |
10034410067CONE | DUP |
10034410067CONE | DUP |
10034410067CONE | DIF |
10034480060CONE | DIF |
10044510062PGEN | DUP |
10044510062PGEN | DIF |
10046570064CONE | DUP |
10046570064CONE | DIF |
10049240061CONE | DIF |
10051110061LADY | DIF |
10051230067PGEN | DIF |
10051230067PONE | DUP |
10051230067PONE | DUP |
10051230067PONE | DUP |
10051230067PONE | DIF |
I miss the logic here:
10031420069CFCP | DIFF |
10031420069PGEN | DUP |
10031420069PGEN | DIFF |
10031420069PONE | DIFF |
Why should the first appearance of 10031420069PGEN be marked DUP, and the second(!) DIFF?
Hi KurtBremser,
I'm trying to move my task from Ms Excel to Sas, that DUP and DIFF is for checking purposes(when i was using Ms Excel, formula is =IF(A2=A3,"DUP","DIF") ), I wanted to keep that in SAS for checking purposes before fully use SAS to do my task.
@sagulolo wrote:
Hi KurtBremser,
I'm trying to move my task from Ms Excel to Sas, that DUP and DIFF is for checking purposes(when i was using Ms Excel, formula is =IF(A2=A3,"DUP","DIF") ), I wanted to keep that in SAS for checking purposes before fully use SAS to do my task.
That does not explain the flaw (as I perceive it) in your logic. Calling a record DIF when it's a duplicate of the preceding record makes my toenails curl.
You might want to review the logic in your Excel.
Hi KurtBremser,
when i have this listing
10017290064CONE | DIF |
10019220069CFCP | DIF |
10019220069PONE | DIF |
10022640063CLADY | DIF |
10031300063CONE | DIF |
10031420069CFCP | DIF |
10031420069PGEN | DUP |
10031420069PGEN | DIF |
10031420069PONE | DIF |
10031770067CONE | DIF |
10034410067CONE | DUP |
10034410067CONE | DUP |
10034410067CONE | DIF |
10034480060CONE | DIF |
10044510062PGEN | DUP |
10044510062PGEN | DIF |
10046570064CONE | DUP |
10046570064CONE | DIF |
10049240061CONE | DIF |
10051110061LADY | DIF |
10051230067PGEN | DIF |
10051230067PONE | DUP |
10051230067PONE | DUP |
10051230067PONE | DUP |
10051230067PONE | DIF |
and im going to produce a new list
10017290064CONE | DIF |
10019220069CFCP | DIF |
10019220069PONE | DIF |
10022640063CLADY | DIF |
10031300063CONE | DIF |
10031420069PGEN | DIF |
10031420069PONE | DIF |
10034410067CONE | DIF |
10046570064CONE | DIF |
10049240061CONE | DIF |
10051110061LADY | DIF |
10051230067PONE | DIF |
of course, it will always have more variable and im always wanted the record with DIF(last row instead of first row)
So you want to set "DUP" when an observation is a duplicate of the following record?
Dear KurtBremser,
Yes, sorry for my bad english.
Then you "simply" read dataset "have" two times side-by-side, but have one read shifted by one observation:
data want;
merge
have
have (firstobs=2 rename=(code=code2))
;
if code = code2
then sgchk = 'DUP';
else sgchk = 'DIF';
drop code2;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.