Identifying multiple possibilities

Accepted Solution Solved
Reply
Contributor
Posts: 29
Accepted Solution

Identifying multiple possibilities

Hi,

 

I need to identify observations having multiple options (Name -> Name2).

Here is my data like...

 

Data test;

input ID NAME $ NAME2 $ Year;

datalines;

1 Highland HighlandBell 2008

1 Highland HighlandBell 2009

1 Highland HighlandBell 2010

1 Highland HighlandCorp 2008

1 Highland HighlandCorp 2009

1 Highland HighlandCorp 2010

1 Highland HighlandMalt 2008

1 Highland HighlandMalt 2009

1 Highland HighlandMalt 2010

2 HillBrosINC HillBrosINC 2011

2 HillBrosINC HillBrosINC 2012

3 HitachiLTD HitachLTD 2008

;

run;

 

So, I want to create one additional column to the dataset. Like..

 

Data test;

input ID NAME $ NAME2 $ Year Want;

datalines;

1 Highland HighlandBell 2008 1

1 Highland HighlandBell 2009 1

1 Highland HighlandBell 2010 1

1 Highland HighlandCorp 2008 1

1 Highland HighlandCorp 2009 1

1 Highland HighlandCorp 2010 1

1 Highland HighlandMalt 2008 1

1 Highland HighlandMalt 2009 1

1 Highland HighlandMalt 2010 1

2 HillBrosINC HillBrosINC 2011 0

2 HillBrosINC HillBrosINC 2012 0

3 HitachiLTD HitachLTD 2008 0

;

run;

 

Thank you!


Accepted Solutions
Solution
‎03-24-2018 11:56 PM
Esteemed Advisor
Posts: 5,541

Re: Identifying multiple possibilities

Then you just need a simple query:

 

Data test;
input ID NAME :$16. NAME2 :$16. Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
proc sql;
create table test2 as
select *, count(distinct name2) > 1 as want
from test
group by name;
quit;
PG

View solution in original post


All Replies
Super User
Posts: 23,776

Re: Identifying multiple possibilities

Are there rules for that column? How are you calculating it?

Contributor
Posts: 29

Re: Identifying multiple possibilities

The Want column indicates whether the "Name" has multiple matching in "Name2".

Esteemed Advisor
Posts: 5,541

Re: Identifying multiple possibilities

What is the role of year?

PG
Contributor
Posts: 29

Re: Identifying multiple possibilities

Actually, there is no rule of year. They are just given to show multiple matches across years.
Solution
‎03-24-2018 11:56 PM
Esteemed Advisor
Posts: 5,541

Re: Identifying multiple possibilities

Then you just need a simple query:

 

Data test;
input ID NAME :$16. NAME2 :$16. Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
proc sql;
create table test2 as
select *, count(distinct name2) > 1 as want
from test
group by name;
quit;
PG
PROC Star
Posts: 1,837

Re: Identifying multiple possibilities

Data test;

input ID NAME :$30. NAME2  :$30. Year;

datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008

;

run;

data want;
do until(last.id);
f=0;
do until(last.name);
set test;
by id name name2;
if lag(name2) ne name2 then f+1;
end;
end;
do until(last.id);
set test;
by id name name2;;
Want=f>1;
output;
end;
drop f;
run;
Super User
Posts: 13,583

Re: Identifying multiple possibilities

Is the variable assigned a value of 1 because name and name2 are different or because you have more than one value of name2 for each value of name? Those are two different rules that would result in the shown desired result.

 

And for some completeness sake, what would the result be for this data if it were part of your example data?

1 Highland Highland    2008

Super User
Posts: 10,787

Re: Identifying multiple possibilities

Data test;
input ID NAME :$16. NAME2 :$16. Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
proc sql;
create table test2 as
select *,case when count(distinct catx(' ',name,name2)) ne 1 then 1
         else 0 end as want
from test
group by name;
quit;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 192 views
  • 2 likes
  • 6 in conversation