- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am looking for a way to add the most frequent text value by group in my selection code. The selection that I am trying to do is in this order;
1) The most frequent one
2) Common name
3) Data Source
3) Recency-the higher id the more recent data (I want to apply this in order to get rid of 1:2 frequency ratio bias)
As an example; from the table below;
id | Newid | Name | Common name | DataSource |
1 | 1 | XXXX | 1 | 1 |
2 | 1 | XXYX | 1 | 0 |
3 | 1 | XXXX | 1 | 1 |
4 | 2 | AAA | 0 | 1 |
5 | 2 | AA | 0 | 0 |
6 | 2 | AAA | 0 | 0 |
7 | 3 | BBB | 1 | 0 |
8 | 3 | BB | 1 | 1 |
to the table below.
id | Newid | Name | Common name | DataSource |
1 | 1 | XXXX | 1 | 1 |
4 | 2 | AAA | 0 | 1 |
8 | 3 | BB | 1 | 1 |
Thank you very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please add an explanation why three rows in your expected result are in the table. And you should add some more lines, so that each case you describe is fulfilled at least once.
Last point: if you want to get answers faster, than post the example data as datastep. This avoids unnecessary guessing data-types etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input id Newid Name $ Common_name DataSource;
datalines;
1 1 XXXX 1 1
2 1 XXYX 1 0
3 1 XXXX 1 1
4 2 AAA 0 1
5 2 AA 0 0
6 2 AAA 0 0
7 3 BBB 1 0
8 3 BB 1 1
;
run;
data want;
set have;
by Newid;
if last.Newid;
run;
Obs id Newid Name Common_name DataSource
1 3 1 XXXX 1 1
2 6 2 AAA 0 0
3 8 3 BB 1 1
Looks to be simple yet does this meet your need.
regards.
DataSP
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello datasp,
Actually the example does not represent my master dataset. The most frequent name could be in the first, last or in between therefore I can't use the last function. A snapshot from the real dataset from which I should be able to get 'ANJELICA' 🙂
id | Newid | Name | Common name | DataSource |
2173787 | 42282 | ANGELICA | 0 | 0 |
914596 | 42282 | ANGELICA | 0 | 0 |
618775 | 42282 | ANJELCA | 0 | 0 |
1934321 | 42282 | ANJELICA | 0 | 0 |
2254092 | 42282 | ANJELICA | 0 | 0 |
1539485 | 42282 | ANJELICA | 0 | 0 |
1478028 | 42282 | ANJELICA | 0 | 0 |
1219086 | 42282 | ANJELICA | 0 | 0 |
730233 | 42282 | ANJELICA | 0 | 0 |
1275142 | 42282 | ANJELICA | 0 | 0 |
148759 | 42282 | ANJELICA | 0 | 0 |
1578160 | 42282 | ANJELICA | 0 | 0 |
674696 | 42282 | ANJELIKA | 0 | 0 |
194036 | 42282 | ANJELIKA | 0 | 0 |
378654 | 42282 | ANJELIKA | 0 | 0 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile cards truncover;
input id :$10. Newid $ Name :$10. Commonname DataSource;
cards;
2173787 42282 ANGELICA 0 0
914596 42282 ANGELICA 0 0
618775 42282 ANJELCA 0 0
1934321 42282 ANJELICA 0 0
2254092 42282 ANJELICA 0 0
1539485 42282 ANJELICA 0 0
1478028 42282 ANJELICA 0 0
1219086 42282 ANJELICA 0 0
730233 42282 ANJELICA 0 0
1275142 42282 ANJELICA 0 0
148759 42282 ANJELICA 0 0
1578160 42282 ANJELICA 0 0
674696 42282 ANJELIKA 0 0
194036 42282 ANJELIKA 0 0
378654 42282 ANJELIKA 0 0
;
proc sql;
create table want(drop=_c) as
select *
from (select *,count( name) as _c from have group by newid,name)
group by newid
having max(_c)=_c;
quit;