BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sagulolo
Quartz | Level 8

HI, 

 

Need some help, i have a list of code and need to do some comparison between first and second and so on. My code as below:-

 

 

data have;
infile cards ;
input code $16.;
cards;
10017290064CONE
10019220069CFCP
10019220069PONE
10022640063CLADY
10031300063CONE
10031420069CFCP
10031420069PGEN
10031420069PGEN
10031420069PONE
10031770067CONE
10034410067CONE
10034480060CONE
10044510062PGEN
10044510062PGEN
10046570064CONE
10046570064CONE
10049240061CONE
10051110061LADY
10051230067PGEN
10051230067PONE
;

data want;
	SET have;
	IF FIRST.CODE THEN 
		SGCHK = "DIFF";
	ELSE
		IF lag(CODE) = CODE THEN 
			SGCHK = "DUP";
		ELSE 
			SGCHK = "DIFF";
RUN;

and output as below:- 

 

codeSGCHK
10017290064CONEDIFF
10019220069CFCPDIFF
10019220069PONEDIFF
10022640063CLADYDIFF
10031300063CONEDIFF
10031420069CFCPDIFF
10031420069PGENDIFF
10031420069PGENDUP
10031420069PONEDIFF
10031770067CONEDIFF
10034410067CONEDIFF
10034480060CONEDIFF
10044510062PGENDIFF
10044510062PGENDUP
10046570064CONEDIFF
10046570064CONEDUP
10049240061CONEDIFF
10051110061LADYDIFF
10051230067PGENDIFF
10051230067PONEDIFF

 

 

Is there a way to get output as below?

 

codeSGCHK
10017290064CONEDIFF
10019220069CFCPDIFF
10019220069PONEDIFF
10022640063CLADYDIFF
10031300063CONEDIFF
10031420069CFCPDIFF
10031420069PGENDUP
10031420069PGENDIFF
10031420069PONEDIFF
10031770067CONEDIFF
10034410067CONEDIFF
10034480060CONEDIFF
10044510062PGENDUP
10044510062PGENDIFF
10046570064CONEDUP
10046570064CONEDIFF
10049240061CONEDIFF
10051110061LADYDIFF
10051230067PGENDIFF
10051230067PONEDIFF

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Then you "simply" read dataset "have" two times side-by-side, but have one read shifted by one observation:

data want;
merge
  have
  have (firstobs=2 rename=(code=code2))
;
if code = code2
then sgchk = 'DUP';
else sgchk = 'DIF';
drop code2;
run;

View solution in original post

10 REPLIES 10
Reeza
Super User

Use the BY groups first/last values. 

 

data want;
Set have;
 
By code;

If first.code and not last.code then SGCHK='DUP';
Else sgchk='DIFF';

Run;
sagulolo
Quartz | Level 8

Thank Reeza,

 

it seem not working if code appear more than 2  as per below:-

 

10034410067CONE - appear 3 times

10051230067PONE - appear 4 times

 

data have;
infile cards ;
input code $16.;
cards;
10017290064CONE
10019220069CFCP
10019220069PONE
10022640063CLADY
10031300063CONE
10031420069CFCP
10031420069PGEN
10031420069PGEN
10031420069PONE
10031770067CONE
10034410067CONE
10034410067CONE
10034410067CONE 10034480060CONE 10044510062PGEN 10044510062PGEN 10046570064CONE 10046570064CONE 10049240061CONE 10051110061LADY 10051230067PGEN 10051230067PONE
10051230067PONE
10051230067PONE
10051230067PONE ; data want; Set have; By code; If first.code and not last.code then SGCHK='DUP'; Else sgchk='DIFF'; Run;

and output is 

 

codeSGCHK
10017290064CONEDIF
10019220069CFCPDIF
10019220069PONEDIF
10022640063CLADYDIF
10031300063CONEDIF
10031420069CFCPDIF
10031420069PGENDUP
10031420069PGENDIF
10031420069PONEDIF
10031770067CONEDIF
10034410067CONEDUP
10034410067CONEDIF
10034410067CONEDIF
10034480060CONEDIF
10044510062PGENDUP
10044510062PGENDIF
10046570064CONEDUP
10046570064CONEDIF
10049240061CONEDIF
10051110061LADYDIF
10051230067PGENDIF
10051230067PONEDUP
10051230067PONEDIF
10051230067PONEDIF
10051230067PONEDIF


The output should be 

 

codeSGCHK
10017290064CONEDIF
10019220069CFCPDIF
10019220069PONEDIF
10022640063CLADYDIF
10031300063CONEDIF
10031420069CFCPDIF
10031420069PGENDUP
10031420069PGENDIF
10031420069PONEDIF
10031770067CONEDIF
10034410067CONEDUP
10034410067CONEDUP
10034410067CONEDIF
10034480060CONEDIF
10044510062PGENDUP
10044510062PGENDIF
10046570064CONEDUP
10046570064CONEDIF
10049240061CONEDIF
10051110061LADYDIF
10051230067PGENDIF
10051230067PONEDUP
10051230067PONEDUP
10051230067PONEDUP
10051230067PONEDIF
Kurt_Bremser
Super User

I miss the logic here:

10031420069CFCP DIFF
10031420069PGEN DUP
10031420069PGEN DIFF
10031420069PONE DIFF

Why should the first appearance of 10031420069PGEN be marked DUP, and the second(!) DIFF?

sagulolo
Quartz | Level 8

Hi KurtBremser,

 

I'm trying to move my task from Ms Excel to Sas, that DUP and DIFF is for checking purposes(when i was using Ms Excel, formula is =IF(A2=A3,"DUP","DIF") ), I wanted to keep that in SAS for checking purposes before fully use SAS to do my task.

Kurt_Bremser
Super User

@sagulolo wrote:

Hi KurtBremser,

 

I'm trying to move my task from Ms Excel to Sas, that DUP and DIFF is for checking purposes(when i was using Ms Excel, formula is =IF(A2=A3,"DUP","DIF") ), I wanted to keep that in SAS for checking purposes before fully use SAS to do my task.


That does not explain the flaw (as I perceive it) in your logic. Calling a record DIF when it's a duplicate of the preceding record makes my toenails curl.

You might want to review the logic in your Excel.

sagulolo
Quartz | Level 8

Hi KurtBremser,

 

when i have this listing

10017290064CONEDIF
10019220069CFCPDIF
10019220069PONEDIF
10022640063CLADYDIF
10031300063CONEDIF
10031420069CFCPDIF
10031420069PGENDUP
10031420069PGENDIF
10031420069PONEDIF
10031770067CONEDIF
10034410067CONEDUP
10034410067CONEDUP
10034410067CONEDIF
10034480060CONEDIF
10044510062PGENDUP
10044510062PGENDIF
10046570064CONEDUP
10046570064CONEDIF
10049240061CONEDIF
10051110061LADYDIF
10051230067PGENDIF
10051230067PONEDUP
10051230067PONEDUP
10051230067PONEDUP
10051230067PONEDIF

 

and im going to produce a new list 

 

10017290064CONEDIF
10019220069CFCPDIF
10019220069PONEDIF
10022640063CLADYDIF
10031300063CONEDIF
10031420069PGENDIF
10031420069PONEDIF
10034410067CONEDIF
10046570064CONEDIF
10049240061CONEDIF
10051110061LADYDIF
10051230067PONEDIF

 

of course, it will always have more variable and im always wanted the record with DIF(last row instead of first row)

sagulolo
Quartz | Level 8

Dear KurtBremser,

 

Yes, sorry for my bad english.

 

 

Kurt_Bremser
Super User

Then you "simply" read dataset "have" two times side-by-side, but have one read shifted by one observation:

data want;
merge
  have
  have (firstobs=2 rename=(code=code2))
;
if code = code2
then sgchk = 'DUP';
else sgchk = 'DIF';
drop code2;
run;
sagulolo
Quartz | Level 8
Dear KurtBremser,

Thank you so much, this is really help me a lot.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2368 views
  • 0 likes
  • 3 in conversation