- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I need to identify observations having multiple options (Name -> Name2).
Here is my data like...
Data test;
input ID NAME $ NAME2 $ Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
run;
So, I want to create one additional column to the dataset. Like..
Data test;
input ID NAME $ NAME2 $ Year Want;
datalines;
1 Highland HighlandBell 2008 1
1 Highland HighlandBell 2009 1
1 Highland HighlandBell 2010 1
1 Highland HighlandCorp 2008 1
1 Highland HighlandCorp 2009 1
1 Highland HighlandCorp 2010 1
1 Highland HighlandMalt 2008 1
1 Highland HighlandMalt 2009 1
1 Highland HighlandMalt 2010 1
2 HillBrosINC HillBrosINC 2011 0
2 HillBrosINC HillBrosINC 2012 0
3 HitachiLTD HitachLTD 2008 0
;
run;
Thank you!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then you just need a simple query:
Data test;
input ID NAME :$16. NAME2 :$16. Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
proc sql;
create table test2 as
select *, count(distinct name2) > 1 as want
from test
group by name;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are there rules for that column? How are you calculating it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The Want column indicates whether the "Name" has multiple matching in "Name2".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is the role of year?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then you just need a simple query:
Data test;
input ID NAME :$16. NAME2 :$16. Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
proc sql;
create table test2 as
select *, count(distinct name2) > 1 as want
from test
group by name;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Data test;
input ID NAME :$30. NAME2 :$30. Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
run;
data want;
do until(last.id);
f=0;
do until(last.name);
set test;
by id name name2;
if lag(name2) ne name2 then f+1;
end;
end;
do until(last.id);
set test;
by id name name2;;
Want=f>1;
output;
end;
drop f;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is the variable assigned a value of 1 because name and name2 are different or because you have more than one value of name2 for each value of name? Those are two different rules that would result in the shown desired result.
And for some completeness sake, what would the result be for this data if it were part of your example data?
1 Highland Highland 2008
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Data test;
input ID NAME :$16. NAME2 :$16. Year;
datalines;
1 Highland HighlandBell 2008
1 Highland HighlandBell 2009
1 Highland HighlandBell 2010
1 Highland HighlandCorp 2008
1 Highland HighlandCorp 2009
1 Highland HighlandCorp 2010
1 Highland HighlandMalt 2008
1 Highland HighlandMalt 2009
1 Highland HighlandMalt 2010
2 HillBrosINC HillBrosINC 2011
2 HillBrosINC HillBrosINC 2012
3 HitachiLTD HitachLTD 2008
;
proc sql;
create table test2 as
select *,case when count(distinct catx(' ',name,name2)) ne 1 then 1
else 0 end as want
from test
group by name;
quit;