BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
y_fu
Fluorite | Level 6

I have a large data set, and one of the sorted column is like the following:

 

Sample_ID

A-001

A-002

A-003

A-004

A-010

A-011

A-012

B-001

B-002

and so on...

 

I want to add a new column to identify the samples that out of sequential order (discontinue  ID numbers), the new column I want is:

 

Sample_ID      Out_of_order

A-001

A-002

A-003

A-004   

A-010               Yes

A-011

A-012

B-001

B-002

B-004               Yes

and so on...

 

I want to know if there are any way to approach it or similar result, thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have;
input Sample_ID $;
c=compress(Sample_ID,,'ka');
n=input(compress(Sample_ID,,'kd'),8.);
cards;
A-001
A-002
A-003
A-004
A-010
A-011
A-012
B-001
B-002
B-004
;
data want;
set have;
by c;
k=dif(n);
if not first.c and k ne 1 then Out_of_order='YES';
keep sample_id Out_of_order;
run;

View solution in original post

7 REPLIES 7
DBailey
Lapis Lazuli | Level 10

in a data step, you can use the lag function to look at previous rows:

 

options pagesize= linesize= pageno=1 nodate;

data one;
   input x @@;
   y=lag1(x);
   z=lag2(x);
   datalines;
1 2 3 4 5 6
;

proc print data=one;
   title 'LAG Output';
run;

 

Output from Generating Two Lagged Values

                                  LAG Output                                 1

                              Obs    x    y    z

                               1     1    .    .
                               2     2    1    .
                               3     3    2    1
                               4     4    3    2
                               5     5    4    3
                               6     6    5    4
y_fu
Fluorite | Level 6

I have a large data set that has more than 1000 obs. I do not think lag function will be efficient. 

PeterClemmensen
Tourmaline | Level 20

Why is Out_Of_Order not equal to 'Yes' for B-001?

ballardw
Super User

@y_fu wrote:

I have a large data set that has more than 1000 obs. I do not think lag function will be efficient. 


If the values are sorted then you only need to look at the previous record.

 

data have;
   input sample_id $;
datalines;
A-001
A-002
A-003
A-004
A-010 
A-011
A-012
B-001
B-002
B-004
;
run;

data want;
   set have;
   prefix=scan(sample_id,1,'-');
   lprefix=scan(lag(sample_id),1,'-');
   order = input(scan(sample_id,2,'-'),4.);
   lorder = input(scan(lag(sample_id),2,'-'),4.);
   Outoforder = (prefix=lprefix) and (order ne (lorder+1));
   drop prefix lprefix order lorder;
run;

You do not specify if the A, B etc value has to be checked in any way.

 

You also do not specify if the first A, B etc value has to be 001 or not.

y_fu
Fluorite | Level 6

Sorry, I should be more specific.

The A,B,C etc. check will not be necessary, and all of the group of samples will start with -001

novinosrin
Tourmaline | Level 20
data have;
input Sample_ID $;
c=compress(Sample_ID,,'ka');
n=input(compress(Sample_ID,,'kd'),8.);
cards;
A-001
A-002
A-003
A-004
A-010
A-011
A-012
B-001
B-002
B-004
;
data want;
set have;
by c;
k=dif(n);
if not first.c and k ne 1 then Out_of_order='YES';
keep sample_id Out_of_order;
run;
y_fu
Fluorite | Level 6

Thank you for the help! It is the result I want.

It is nice to learn something new about how to use compress function.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2785 views
  • 1 like
  • 5 in conversation