## Most frequent observation by row & number of distinct values by row

Solved
Frequent Contributor
Posts: 82

# Most frequent observation by row & number of distinct values by row

``````data have;
input id date1:date9. date2:date9. date3:date9.;
format date1 date9. date2 date9. date3 date9.;
cards;
1 01JUN2001 01JUN2001 01JUN2001
2 01JUN2001 01JUN2001 05MAY1990
3 05MAY1990 05MAY1990 01JUN2001
4 01JUN2001 05MAY1990 .
;
run;

data want;
input id date1:date9. date2:date9. date3:date9. date_adj:date9. date_count:best8.;
format date1 date9. date2 date9. date3 date9. date_adj date9. date_count best8.;
cards;
1 01JUN2001 01JUN2001 01JUN2001 01JUN2001 1
2 01JUN2001 01JUN2001 05MAY1990 01JUN2001 2
3 05MAY1990 05MAY1990 01JUN2001 05MAY1990 2
4 01JUN2001 05MAY1990 . . 2
;
run;``````

From data have to data want:

• Create a new column named 'date_adj', it would be the most frequent observation in that row; if same frequency, then output blank, or first/last appeared non-blank value (not really fussy about this output).
• Create a new column named 'date_count', count the number of distinct values by that row.

I understand SAS can do this by column, not sure how to achieve it by rows.

Thanks.

Accepted Solutions
Solution
‎06-05-2017 09:20 PM
Frequent Contributor
Posts: 82

## Re: Most frequent observation by row & number of distinct values by row

The following codes proved to be working:

``````%macro checkUnique(fileName=,col=,ID=);
%global colList;

/* get all date columns' names */
proc sql noprint;
select name into: colList separated by ' '
from dictionary.columns
where libname = 'WORK' and memname contains upcase("&fileName") and name contains upcase("&col");
quit;

/* transpose back to flat table */
data temp1(keep= ID_adj frequent position);
set work.&fileName.(rename= (&ID.=ID));
array &col._array{*} &colList.;
do i = 1 to dim(&col._array);
ID_adj = ID;
frequent = &col._array{i};
position = i;
output;
end;
format frequent date9.;
run;

/* get the frequency of each date value */
proc sql noprint;
create table temp2 as select
ID_adj,
frequent,
count(frequent) as count
/* count(distinct(frequent)) as num_unique */
from temp1
group by ID_adj, frequent;
quit;

proc sort data= temp2(where=(frequent ne .));
by ID_adj descending count descending frequent;
run;

/* get the most frequent date */
data output.&fileName.(keep= &ID. &col._adj);
set temp2;
by ID_adj descending count;
if first.ID_adj;
format frequent date9.;
rename ID_adj=&ID.;
rename frequent=&col._adj;
run;

%mend checkUnique;

%checkUnique(fileName=test,col=date,ID=id);``````

All Replies
Super User
Posts: 23,928

## Re: Most frequent observation by row & number of distinct values by row

[ Edited ]

Transpose to a row structure via PROC TRANSPOSE, obtain solution, and merge it back into main dataset.

PROC Star
Posts: 548

## Re: Most frequent observation by row & number of distinct values by row

[ Edited ]

By using case statements but we careful in case statement if below query has missed any scenario. But you get idea by seeing this query.

```proc sql;
create table want as
select date1, date2, date3,
case when date1=date2 and date1 =date3 then date1
when date1 =date2 and date1<>date3 then date1
when date1 =date3 and date1 <>date2 then date1
when date2= date3 and date1<>date3 then date2
when date1<>date2 and date2<>date3 then .
end as date_adj format=date9.,
case when date1=date2 and date1 =date3 then 1
when date1 =date2 and date1<>date3 then 2
when date1 =date3 and date1 <>date2 then 2
when date2= date3 and date1<>date3 then 2
when date1<>date2 and date2<>date3 then 2
end as date_count
from have ;
quit;```

Frequent Contributor
Posts: 82

## Re: Most frequent observation by row & number of distinct values by row

Hi @kiranv_

Thanks for your approach. Because in my real data I have 8 columns, it might be too hard to build the logic by using 'Case'.

PROC Star
Posts: 548

## Re: Most frequent observation by row & number of distinct values by row

Yes. @Ksharp approach is best to follow

Super User
Posts: 10,846

## Re: Most frequent observation by row & number of distinct values by row

[ Edited ]

How about IML code ?

``````data have;
input id date1:date9. date2:date9. date3:date9.;
format date1 date9. date2 date9. date3 date9.;
cards;
1 01JUN2001 01JUN2001 01JUN2001
2 01JUN2001 01JUN2001 05MAY1990
3 05MAY1990 05MAY1990 01JUN2001
4 01JUN2001 05MAY1990 .
;
run;
proc iml;
use have(keep=date:) nobs nobs;
read all var _all_ into x;
close;
date_adj=j(nobs,1,.);
date_count=j(nobs,1,.);
do i=1 to nobs;
temp=x[i,];
call tabulate(level,freq,temp);
if sum(freq=max(freq)) ^= 1 then date_adj[i]=.;
else date_adj[i]=level[ freq[<:>] ];
date_count[i]=ncol(level);
end;
create temp var{date_adj date_count};
append;
close;
run;

data want;
merge have temp;
format date_adj date9.;
run;``````
Frequent Contributor
Posts: 82

## Re: Most frequent observation by row & number of distinct values by row

Thanks @Ksharp, I believe your codes can help a lot of people tackle similar problems. We don't have IML however.

Solution
‎06-05-2017 09:20 PM
Frequent Contributor
Posts: 82

## Re: Most frequent observation by row & number of distinct values by row

The following codes proved to be working:

``````%macro checkUnique(fileName=,col=,ID=);
%global colList;

/* get all date columns' names */
proc sql noprint;
select name into: colList separated by ' '
from dictionary.columns
where libname = 'WORK' and memname contains upcase("&fileName") and name contains upcase("&col");
quit;

/* transpose back to flat table */
data temp1(keep= ID_adj frequent position);
set work.&fileName.(rename= (&ID.=ID));
array &col._array{*} &colList.;
do i = 1 to dim(&col._array);
ID_adj = ID;
frequent = &col._array{i};
position = i;
output;
end;
format frequent date9.;
run;

/* get the frequency of each date value */
proc sql noprint;
create table temp2 as select
ID_adj,
frequent,
count(frequent) as count
/* count(distinct(frequent)) as num_unique */
from temp1
group by ID_adj, frequent;
quit;

proc sort data= temp2(where=(frequent ne .));
by ID_adj descending count descending frequent;
run;

/* get the most frequent date */
data output.&fileName.(keep= &ID. &col._adj);
set temp2;
by ID_adj descending count;
if first.ID_adj;
format frequent date9.;
rename ID_adj=&ID.;
rename frequent=&col._adj;
run;

%mend checkUnique;

%checkUnique(fileName=test,col=date,ID=id);``````

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
• 7 replies
• 351 views
• 1 like
• 4 in conversation