Solved: Re: Identifying multiple possibilities

hkim3677 · Posted 03-23-2018 04:18 PM

Hi,

I need to identify observations having multiple options (Name -> Name2).

Here is my data like...

Data test;

input ID NAME $ NAME2 $ Year;

datalines;

1 Highland HighlandBell 2008

1 Highland HighlandBell 2009

1 Highland HighlandBell 2010

1 Highland HighlandCorp 2008

1 Highland HighlandCorp 2009

1 Highland HighlandCorp 2010

1 Highland HighlandMalt 2008

1 Highland HighlandMalt 2009

1 Highland HighlandMalt 2010

2 HillBrosINC HillBrosINC 2011

2 HillBrosINC HillBrosINC 2012

3 HitachiLTD HitachLTD 2008

;

run;

So, I want to create one additional column to the dataset. Like..

Data test;

input ID NAME $ NAME2 $ Year Want;

datalines;

1 Highland HighlandBell 2008 1

1 Highland HighlandBell 2009 1

1 Highland HighlandBell 2010 1

1 Highland HighlandCorp 2008 1

1 Highland HighlandCorp 2009 1

1 Highland HighlandCorp 2010 1

1 Highland HighlandMalt 2008 1

1 Highland HighlandMalt 2009 1

1 Highland HighlandMalt 2010 1

2 HillBrosINC HillBrosINC 2011 0

2 HillBrosINC HillBrosINC 2012 0

3 HitachiLTD HitachLTD 2008 0

;

run;

Thank you!

PGStats · Posted 03-23-2018 06:03 PM

Then you just need a simple query:

Data test;
input ID NAME :$16. NAME2 :$16. Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
proc sql;
create table test2 as
select *, count(distinct name2) > 1 as want
from test
group by name;
quit;

PG

View solution in original post

Reeza · Posted 03-23-2018 04:32 PM

Are there rules for that column? How are you calculating it?

hkim3677 · Posted 03-23-2018 05:54 PM

The Want column indicates whether the "Name" has multiple matching in "Name2".

PGStats · Posted 03-23-2018 04:58 PM

What is the role of year?

PG

hkim3677 · Posted 03-23-2018 05:55 PM

Actually, there is no rule of year. They are just given to show multiple matches across years.

PGStats · Posted 03-23-2018 06:03 PM

Then you just need a simple query:

Data test;
input ID NAME :$16. NAME2 :$16. Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
proc sql;
create table test2 as
select *, count(distinct name2) > 1 as want
from test
group by name;
quit;

PG

novinosrin · Posted 03-23-2018 05:47 PM

Data test;

input ID NAME :$30. NAME2  :$30. Year;

datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008

;

run;

data want;
do until(last.id);
f=0;
do until(last.name);
set test;
by id name name2;
if lag(name2) ne name2 then f+1;
end;
end;
do until(last.id);
set test;
by id name name2;;
Want=f>1;
output;
end;
drop f;
run;

ballardw · Posted 03-23-2018 05:55 PM

Is the variable assigned a value of 1 because name and name2 are different or because you have more than one value of name2 for each value of name? Those are two different rules that would result in the shown desired result.

And for some completeness sake, what would the result be for this data if it were part of your example data?

1 Highland Highland 2008

Ksharp · Posted 03-24-2018 06:36 AM

Data test;
input ID NAME :$16. NAME2 :$16. Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
proc sql;
create table test2 as
select *,case when count(distinct catx(' ',name,name2)) ne 1 then 1
         else 0 end as want
from test
group by name;
quit;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away