Solved: Re: How to find the value in three consecutive ID

gbayya · Posted 07-02-2020 01:54 PM

Hi All,

In the following table I'm trying to find the value in three consecutive ID's,

sample data

ID Name

1 AGB

1 CONN

1 BGCA

2 AGB

2 CONN

2 CAZ

3 CONN

3 CAZ

3 AGB

3 JRSC

4 ABM

4 BACO

4 AGB

4 HFC

5 BACO

5 HFC

6 BACO

6 JKB

7 HFC

7 BACO

7 JKB

Thanks for the help.

mkeintz · Posted 07-02-2020 04:49 PM

I presume you want to find any instances of a name appearing in 3 or more consecutive id's. Let's say you want a dataset of the qualifying names with 3 variables: name,start_id, end_id.

data have;
  input ID  Name :$4.  @@;
datalines;
1 AGB     1 AGB    1 CONN
1 BGCA    2 AGB    2 CONN
2 CAZ     3 CONN    3 CAZ
3 AGB     3 JRSC    4 ABM
4 ABM     4 BACO    4 AGB
4 HFC     5 BACO    5 HFC
6 BACO    6 JKB     7 HFC
7 BACO    7 JKB
1 xxx   2 xxx 3 xxx    5 xxx 6 xxx 7 xxx 8 xxx
run;

proc sort data=have nodupkey  out=need;
  by name id;
run;

data want (rename=(id=end_id) drop=nxt_:);
  set need (keep=name);
  by name;
  merge need need (firstobs=2 keep=id rename=(id=nxt_id)) ;
  retain start_id ;
  if first.name=1 or lag(id)^=id-1 then start_id=id;
  if (last.name=1 or nxt_id>id+1) and start_id<=id-2;
run;

Notes:

I added some records for name=xxx, to show what happens if a given name has 2 (or more) unconnected series satisfying your conditions.
The "set need ... by name" statemetn pair is there to tell sas only to create 2 dummy variables (first.name and last.name) to tell when the record-in-hand is the start (or end) of a given id. Note it only reads one variable. That's becuase of the self-merge that follows.
The self-merge retrieve all the data in the current record and one variabl (ID renamed to nxt_id) in the upcoming record. This way you can tell if the next record will have the next id, or present an id "gap".
Every time the data step starts a new id, or starts after a gap in the current it, then START_ID is reset.
The second if is a "subsetting if" keeping only instances when you are at the end of an id or just before a gap within an id, and you have processed 3 (or more) consecutive id's with the same name.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ed_sas_member · Posted 07-02-2020 02:27 PM

Hi @gbayya

Could you please explain the logic and show what the output would be? (a report / a dataset ?)

Best,

mklangley · Posted 07-02-2020 02:38 PM

Is this what you're looking for?

proc sort data=have out=have_sorted; by name id; run;

data want (keep=name);
    set have_sorted;
    by name;
    retain count;
    if first.name then count = 1;
    else if count > 2 then; /* do nothing */
    else do;
        if lag(id) = id-1 then count + 1;
        else count = 1;
    end;
    if last.name and count >= 2 then output;
    ;
run;

gbayya · Posted 07-02-2020 04:29 PM

Thank you for the solution, but i am expecting something like the below output

The output should look something like this

AGB 1 2 3

CONN 1 2 3

CAZ 1 2 3

because these names appear in three consecutive ID's

gbayya · Posted 07-02-2020 04:26 PM

The output should look something like this

AGB 1 2 3

CONN 1 2 3

CAZ 1 2 3

because these names appear in three consecutive ID's

mklangley · Posted 07-02-2020 04:34 PM

Why is CAZ included? You only had that NAME twice in your original data. Perhaps you meant BACO, instead?

And are those numbers supposed to represent the IDs for the NAMEs that appear 3+ times? E.g.:

Name      ID
AGB       1 2 3 4
BACO      4 5 6 7
CONN      1 2 3

mkeintz · Posted 07-02-2020 04:49 PM

I presume you want to find any instances of a name appearing in 3 or more consecutive id's. Let's say you want a dataset of the qualifying names with 3 variables: name,start_id, end_id.

data have;
  input ID  Name :$4.  @@;
datalines;
1 AGB     1 AGB    1 CONN
1 BGCA    2 AGB    2 CONN
2 CAZ     3 CONN    3 CAZ
3 AGB     3 JRSC    4 ABM
4 ABM     4 BACO    4 AGB
4 HFC     5 BACO    5 HFC
6 BACO    6 JKB     7 HFC
7 BACO    7 JKB
1 xxx   2 xxx 3 xxx    5 xxx 6 xxx 7 xxx 8 xxx
run;

proc sort data=have nodupkey  out=need;
  by name id;
run;

data want (rename=(id=end_id) drop=nxt_:);
  set need (keep=name);
  by name;
  merge need need (firstobs=2 keep=id rename=(id=nxt_id)) ;
  retain start_id ;
  if first.name=1 or lag(id)^=id-1 then start_id=id;
  if (last.name=1 or nxt_id>id+1) and start_id<=id-2;
run;

Notes:

I added some records for name=xxx, to show what happens if a given name has 2 (or more) unconnected series satisfying your conditions.
The "set need ... by name" statemetn pair is there to tell sas only to create 2 dummy variables (first.name and last.name) to tell when the record-in-hand is the start (or end) of a given id. Note it only reads one variable. That's becuase of the self-merge that follows.
The self-merge retrieve all the data in the current record and one variabl (ID renamed to nxt_id) in the upcoming record. This way you can tell if the next record will have the next id, or present an id "gap".
Every time the data step starts a new id, or starts after a gap in the current it, then START_ID is reset.
The second if is a "subsetting if" keeping only instances when you are at the end of an id or just before a gap within an id, and you have processed 3 (or more) consecutive id's with the same name.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

gbayya · Posted 07-03-2020 03:46 AM

@mkeintz

Thank you very much for the solution, perfect.

SAS Innovate 2025: Call for Content

Classroom Training Available!