SAS Programming

Asli_A · Posted 07-20-2018 05:17 AM

Hello,

I am looking for a way to add the most frequent text value by group in my selection code. The selection that I am trying to do is in this order;

1) The most frequent one

2) Common name

3) Data Source

3) Recency-the higher id the more recent data (I want to apply this in order to get rid of 1:2 frequency ratio bias)

As an example; from the table below;

id	Newid	Name	Common name	DataSource
1	1	XXXX	1	1
2	1	XXYX	1	0
3	1	XXXX	1	1
4	2	AAA	0	1
5	2	AA	0	0
6	2	AAA	0	0
7	3	BBB	1	0
8	3	BB	1	1

to the table below.

id	Newid	Name	Common name	DataSource
1	1	XXXX	1	1
4	2	AAA	0	1
8	3	BB	1	1

Thank you very much!

andreas_lds · Posted 07-20-2018 07:10 AM

Please add an explanation why three rows in your expected result are in the table. And you should add some more lines, so that each case you describe is fulfilled at least once.

Last point: if you want to get answers faster, than post the example data as datastep. This avoids unnecessary guessing data-types etc.

KachiM · Posted 07-20-2018 08:04 AM

data have;
input id Newid Name $ Common_name DataSource;
datalines;
1  1  XXXX  1  1
2  1  XXYX  1  0
3  1  XXXX  1  1
4  2  AAA   0  1
5  2  AA 0  0
6  2  AAA   0  0
7  3  BBB   1  0
8  3  BB    1  1
;
run;

data want;
   set have;
   by Newid;
   if last.Newid;
run;

Obs 	id 	Newid 	Name 	Common_name 	DataSource
1 	3 	1 	XXXX 	1 	1
2 	6 	2 	AAA 	0 	0
3 	8 	3 	BB 	1 	1

Looks to be simple yet does this meet your need.

regards.

DataSP

Asli_A · Posted 07-20-2018 08:34 AM

Hello datasp,

Actually the example does not represent my master dataset. The most frequent name could be in the first, last or in between therefore I can't use the last function. A snapshot from the real dataset from which I should be able to get 'ANJELICA' 🙂

id	Newid	Name	Common name	DataSource
2173787	42282	ANGELICA	0	0
914596	42282	ANGELICA	0	0
618775	42282	ANJELCA	0	0
1934321	42282	ANJELICA	0	0
2254092	42282	ANJELICA	0	0
1539485	42282	ANJELICA	0	0
1478028	42282	ANJELICA	0	0
1219086	42282	ANJELICA	0	0
730233	42282	ANJELICA	0	0
1275142	42282	ANJELICA	0	0
148759	42282	ANJELICA	0	0
1578160	42282	ANJELICA	0	0
674696	42282	ANJELIKA	0	0
194036	42282	ANJELIKA	0	0
378654	42282	ANJELIKA	0	0

novinosrin · Posted 07-20-2018 10:07 AM

data have;
infile cards truncover;
input id :$10.	Newid $	Name :$10.	Commonname 	DataSource;
cards;
2173787	42282	ANGELICA	0	0
914596	42282	ANGELICA	0	0
618775	42282	ANJELCA	0	0
1934321	42282	ANJELICA	0	0
2254092	42282	ANJELICA	0	0
1539485	42282	ANJELICA	0	0
1478028	42282	ANJELICA	0	0
1219086	42282	ANJELICA	0	0
730233	42282	ANJELICA	0	0
1275142	42282	ANJELICA	0	0
148759	42282	ANJELICA	0	0
1578160	42282	ANJELICA	0	0
674696	42282	ANJELIKA	0	0
194036	42282	ANJELIKA	0	0
378654	42282	ANJELIKA	0	0
;

proc sql;
create table want(drop=_c) as
select *
from (select *,count( name) as _c from have group by newid,name) 
group by newid
having max(_c)=_c;
quit;

SAS Programming

Finding the most frequent text value by group

Re: Finding the most frequent text value by group

Re: Finding the most frequent text value by group

Re: Finding the most frequent text value by group

Re: Finding the most frequent text value by group

find the count of variable by group

ModelOps, MLOps, and Finding Value in Analytics

Using Text Box to enter list of values for filtering

Find Columns with Missing Values in a list of tables

Finding common value for a group

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...