Hi All,
In the following table I'm trying to find the value in three consecutive ID's,
sample data
ID Name
1 AGB
1 AGB
1 CONN
1 BGCA
2 AGB
2 CONN
2 CAZ
3 CONN
3 CAZ
3 AGB
3 JRSC
4 ABM
4 ABM
4 BACO
4 AGB
4 HFC
5 BACO
5 HFC
6 BACO
6 JKB
7 HFC
7 BACO
7 JKB
Thanks for the help.
I presume you want to find any instances of a name appearing in 3 or more consecutive id's. Let's say you want a dataset of the qualifying names with 3 variables: name,start_id, end_id.
data have;
input ID Name :$4. @@;
datalines;
1 AGB 1 AGB 1 CONN
1 BGCA 2 AGB 2 CONN
2 CAZ 3 CONN 3 CAZ
3 AGB 3 JRSC 4 ABM
4 ABM 4 BACO 4 AGB
4 HFC 5 BACO 5 HFC
6 BACO 6 JKB 7 HFC
7 BACO 7 JKB
1 xxx 2 xxx 3 xxx 5 xxx 6 xxx 7 xxx 8 xxx
run;
proc sort data=have nodupkey out=need;
by name id;
run;
data want (rename=(id=end_id) drop=nxt_:);
set need (keep=name);
by name;
merge need need (firstobs=2 keep=id rename=(id=nxt_id)) ;
retain start_id ;
if first.name=1 or lag(id)^=id-1 then start_id=id;
if (last.name=1 or nxt_id>id+1) and start_id<=id-2;
run;
Notes:
Hi @gbayya
Could you please explain the logic and show what the output would be? (a report / a dataset ?)
Best,
Is this what you're looking for?
proc sort data=have out=have_sorted; by name id; run;
data want (keep=name);
set have_sorted;
by name;
retain count;
if first.name then count = 1;
else if count > 2 then; /* do nothing */
else do;
if lag(id) = id-1 then count + 1;
else count = 1;
end;
if last.name and count >= 2 then output;
;
run;
Thank you for the solution, but i am expecting something like the below output
The output should look something like this
AGB 1 2 3
CONN 1 2 3
CAZ 1 2 3
because these names appear in three consecutive ID's
The output should look something like this
AGB 1 2 3
CONN 1 2 3
CAZ 1 2 3
because these names appear in three consecutive ID's
Why is CAZ included? You only had that NAME twice in your original data. Perhaps you meant BACO, instead?
And are those numbers supposed to represent the IDs for the NAMEs that appear 3+ times? E.g.:
Name ID AGB 1 2 3 4 BACO 4 5 6 7 CONN 1 2 3
I presume you want to find any instances of a name appearing in 3 or more consecutive id's. Let's say you want a dataset of the qualifying names with 3 variables: name,start_id, end_id.
data have;
input ID Name :$4. @@;
datalines;
1 AGB 1 AGB 1 CONN
1 BGCA 2 AGB 2 CONN
2 CAZ 3 CONN 3 CAZ
3 AGB 3 JRSC 4 ABM
4 ABM 4 BACO 4 AGB
4 HFC 5 BACO 5 HFC
6 BACO 6 JKB 7 HFC
7 BACO 7 JKB
1 xxx 2 xxx 3 xxx 5 xxx 6 xxx 7 xxx 8 xxx
run;
proc sort data=have nodupkey out=need;
by name id;
run;
data want (rename=(id=end_id) drop=nxt_:);
set need (keep=name);
by name;
merge need need (firstobs=2 keep=id rename=(id=nxt_id)) ;
retain start_id ;
if first.name=1 or lag(id)^=id-1 then start_id=id;
if (last.name=1 or nxt_id>id+1) and start_id<=id-2;
run;
Notes:
Thank you very much for the solution, perfect.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.