BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gbayya
Fluorite | Level 6

Hi All,

 

In the following table I'm trying to find the value in three consecutive ID's, 

sample data

ID              Name

1                AGB

1                AGB

1                CONN

1                BGCA

2                AGB

2                CONN

2                CAZ

3                 CONN

3                CAZ

3                AGB

3               JRSC

4               ABM

4               ABM

4               BACO

4              AGB

4              HFC

5              BACO

5              HFC

6               BACO

6              JKB

7             HFC

7             BACO

7            JKB

 

Thanks for the help.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I presume you want to find any instances of a name appearing in 3 or more consecutive id's.  Let's say you want a dataset of the qualifying names with 3 variables:  name,start_id, end_id.

 

data have;
  input ID  Name :$4.  @@;
datalines;
1 AGB     1 AGB    1 CONN
1 BGCA    2 AGB    2 CONN
2 CAZ     3 CONN    3 CAZ
3 AGB     3 JRSC    4 ABM
4 ABM     4 BACO    4 AGB
4 HFC     5 BACO    5 HFC
6 BACO    6 JKB     7 HFC
7 BACO    7 JKB
1 xxx   2 xxx 3 xxx    5 xxx 6 xxx 7 xxx 8 xxx
run;

proc sort data=have nodupkey  out=need;
  by name id;
run;

data want (rename=(id=end_id) drop=nxt_:);
  set need (keep=name);
  by name;
  merge need need (firstobs=2 keep=id rename=(id=nxt_id)) ;
  retain start_id ;
  if first.name=1 or lag(id)^=id-1 then start_id=id;
  if (last.name=1 or nxt_id>id+1) and start_id<=id-2;
run;

Notes:

  1. I added some records for name=xxx, to show what happens if a given name has 2 (or more) unconnected series satisfying your conditions.  
  2. The "set need ... by name" statemetn pair is there to tell sas only to create 2 dummy variables (first.name and last.name) to tell when the record-in-hand is the start (or end) of a given id.  Note it only reads one variable.  That's becuase of the self-merge that follows.
  3. The self-merge retrieve all the data in the current record and one variabl (ID renamed to nxt_id) in the upcoming record.  This way you can tell if the next record will have the next id, or present an id "gap".
  4. Every time the data step starts a new id, or starts after a gap in the current it, then START_ID is reset.
  5. The second if is a "subsetting if" keeping only instances when you are at the end of an id or just before a gap within an id,  and you have processed 3 (or more) consecutive id's with the same name.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
ed_sas_member
Meteorite | Level 14

Hi @gbayya 

Could you please explain the logic and show what the output would be? (a report / a dataset ?)

Best,

mklangley
Lapis Lazuli | Level 10

Is this what you're looking for?

proc sort data=have out=have_sorted; by name id; run;

data want (keep=name);
    set have_sorted;
    by name;
    retain count;
    if first.name then count = 1;
    else if count > 2 then; /* do nothing */
    else do;
        if lag(id) = id-1 then count + 1;
        else count = 1;
    end;
    if last.name and count >= 2 then output;
    ;
run;
gbayya
Fluorite | Level 6

Thank you for the solution, but i am expecting something like the below output

 

The output should look something like this

AGB 1 2 3

CONN 1 2 3

CAZ 1 2 3

 

because these names appear in three consecutive ID's

gbayya
Fluorite | Level 6

The output should look something like this

AGB 1 2 3

CONN 1 2 3

CAZ 1 2 3

 

because these names appear in three consecutive ID's

 

 

 

mklangley
Lapis Lazuli | Level 10

Why is CAZ included? You only had that NAME twice in your original data. Perhaps you meant BACO, instead?

 

And are those numbers supposed to represent the IDs for the NAMEs that appear 3+ times? E.g.:

Name      ID
AGB       1 2 3 4
BACO      4 5 6 7
CONN      1 2 3
mkeintz
PROC Star

I presume you want to find any instances of a name appearing in 3 or more consecutive id's.  Let's say you want a dataset of the qualifying names with 3 variables:  name,start_id, end_id.

 

data have;
  input ID  Name :$4.  @@;
datalines;
1 AGB     1 AGB    1 CONN
1 BGCA    2 AGB    2 CONN
2 CAZ     3 CONN    3 CAZ
3 AGB     3 JRSC    4 ABM
4 ABM     4 BACO    4 AGB
4 HFC     5 BACO    5 HFC
6 BACO    6 JKB     7 HFC
7 BACO    7 JKB
1 xxx   2 xxx 3 xxx    5 xxx 6 xxx 7 xxx 8 xxx
run;

proc sort data=have nodupkey  out=need;
  by name id;
run;

data want (rename=(id=end_id) drop=nxt_:);
  set need (keep=name);
  by name;
  merge need need (firstobs=2 keep=id rename=(id=nxt_id)) ;
  retain start_id ;
  if first.name=1 or lag(id)^=id-1 then start_id=id;
  if (last.name=1 or nxt_id>id+1) and start_id<=id-2;
run;

Notes:

  1. I added some records for name=xxx, to show what happens if a given name has 2 (or more) unconnected series satisfying your conditions.  
  2. The "set need ... by name" statemetn pair is there to tell sas only to create 2 dummy variables (first.name and last.name) to tell when the record-in-hand is the start (or end) of a given id.  Note it only reads one variable.  That's becuase of the self-merge that follows.
  3. The self-merge retrieve all the data in the current record and one variabl (ID renamed to nxt_id) in the upcoming record.  This way you can tell if the next record will have the next id, or present an id "gap".
  4. Every time the data step starts a new id, or starts after a gap in the current it, then START_ID is reset.
  5. The second if is a "subsetting if" keeping only instances when you are at the end of an id or just before a gap within an id,  and you have processed 3 (or more) consecutive id's with the same name.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
gbayya
Fluorite | Level 6

@mkeintz 

Thank you very much for the solution, perfect.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1625 views
  • 2 likes
  • 4 in conversation