Desktop productivity for business analysts and programmers

Most frequent observation by row & number of distinct values by row

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 80
Accepted Solution

Most frequent observation by row & number of distinct values by row

 

 

data have;
  input id date1:date9. date2:date9. date3:date9.;
  format date1 date9. date2 date9. date3 date9.;
  cards;
1 01JUN2001 01JUN2001 01JUN2001
2 01JUN2001 01JUN2001 05MAY1990
3 05MAY1990 05MAY1990 01JUN2001
4 01JUN2001 05MAY1990 .
;
run;

data want;
  input id date1:date9. date2:date9. date3:date9. date_adj:date9. date_count:best8.;
  format date1 date9. date2 date9. date3 date9. date_adj date9. date_count best8.;
  cards;
1 01JUN2001 01JUN2001 01JUN2001 01JUN2001 1
2 01JUN2001 01JUN2001 05MAY1990 01JUN2001 2
3 05MAY1990 05MAY1990 01JUN2001 05MAY1990 2
4 01JUN2001 05MAY1990 . . 2
;
run;

 

From data have to data want:

  • Create a new column named 'date_adj', it would be the most frequent observation in that row; if same frequency, then output blank, or first/last appeared non-blank value (not really fussy about this output).
  • Create a new column named 'date_count', count the number of distinct values by that row.

I understand SAS can do this by column, not sure how to achieve it by rows.

 

Thanks.


Accepted Solutions
Solution
‎06-05-2017 09:20 PM
Frequent Contributor
Posts: 80

Re: Most frequent observation by row & number of distinct values by row

The following codes proved to be working:

%macro checkUnique(fileName=,col=,ID=);
%global colList;

/* get all date columns' names */
proc sql noprint;
select name into: colList separated by ' '
  from dictionary.columns
  where libname = 'WORK' and memname contains upcase("&fileName") and name contains upcase("&col");
quit;

/* transpose back to flat table */
data temp1(keep= ID_adj frequent position);
  set work.&fileName.(rename= (&ID.=ID));
  array &col._array{*} &colList.;
  do i = 1 to dim(&col._array);
    ID_adj = ID;
    frequent = &col._array{i};
    position = i;
    output;
  end;
  format frequent date9.;
run;

/* get the frequency of each date value */
proc sql noprint;
  create table temp2 as select
      ID_adj,
      frequent,
      count(frequent) as count
      /* count(distinct(frequent)) as num_unique */
          from temp1
          group by ID_adj, frequent;
quit;

proc sort data= temp2(where=(frequent ne .));
  by ID_adj descending count descending frequent;
run;

/* get the most frequent date */
data output.&fileName.(keep= &ID. &col._adj);
  set temp2;
  by ID_adj descending count;
  if first.ID_adj;
  format frequent date9.;
  rename ID_adj=&ID.;
  rename frequent=&col._adj;
run;

%mend checkUnique;

%checkUnique(fileName=test,col=date,ID=id);

 

View solution in original post


All Replies
Super User
Posts: 19,052

Re: Most frequent observation by row & number of distinct values by row

[ Edited ]

Transpose to a row structure via PROC TRANSPOSE, obtain solution, and merge it back into main dataset. 

PROC Star
Posts: 279

Re: Most frequent observation by row & number of distinct values by row

[ Edited ]

By using case statements but we careful in case statement if below query has missed any scenario. But you get idea by seeing this query.

 

proc sql;
create table want as
select date1, date2, date3,
 case when date1=date2 and date1 =date3 then date1
      when date1 =date2 and date1<>date3 then date1
      when date1 =date3 and date1 <>date2 then date1
      when date2= date3 and date1<>date3 then date2
      when date1<>date2 and date2<>date3 then .
 end as date_adj format=date9.,
 case when date1=date2 and date1 =date3 then 1
      when date1 =date2 and date1<>date3 then 2
      when date1 =date3 and date1 <>date2 then 2
       when date2= date3 and date1<>date3 then 2
      when date1<>date2 and date2<>date3 then 2
 end as date_count
 from have ;
 quit;

 

 

 

 

Frequent Contributor
Posts: 80

Re: Most frequent observation by row & number of distinct values by row

Hi @kiranv_

Thanks for your approach. Because in my real data I have 8 columns, it might be too hard to build the logic by using 'Case'.

PROC Star
Posts: 279

Re: Most frequent observation by row & number of distinct values by row

Yes. @Ksharp approach is best to follow

Super User
Posts: 9,865

Re: Most frequent observation by row & number of distinct values by row

[ Edited ]

How about IML code ?

 

data have;
  input id date1:date9. date2:date9. date3:date9.;
  format date1 date9. date2 date9. date3 date9.;
  cards;
1 01JUN2001 01JUN2001 01JUN2001
2 01JUN2001 01JUN2001 05MAY1990
3 05MAY1990 05MAY1990 01JUN2001
4 01JUN2001 05MAY1990 .
;
run;
proc iml;
use have(keep=date:) nobs nobs;
read all var _all_ into x;
close;
date_adj=j(nobs,1,.);
date_count=j(nobs,1,.);
do i=1 to nobs;
 temp=x[i,];
 call tabulate(level,freq,temp);
 if sum(freq=max(freq)) ^= 1 then date_adj[i]=.;
  else date_adj[i]=level[ freq[<:>] ];
 date_count[i]=ncol(level);
end;
create temp var{date_adj date_count};
append;
close;
run;

data want;
 merge have temp;
 format date_adj date9.;
run;
Frequent Contributor
Posts: 80

Re: Most frequent observation by row & number of distinct values by row

Thanks @Ksharp, I believe your codes can help a lot of people tackle similar problems. We don't have IML however.

Solution
‎06-05-2017 09:20 PM
Frequent Contributor
Posts: 80

Re: Most frequent observation by row & number of distinct values by row

The following codes proved to be working:

%macro checkUnique(fileName=,col=,ID=);
%global colList;

/* get all date columns' names */
proc sql noprint;
select name into: colList separated by ' '
  from dictionary.columns
  where libname = 'WORK' and memname contains upcase("&fileName") and name contains upcase("&col");
quit;

/* transpose back to flat table */
data temp1(keep= ID_adj frequent position);
  set work.&fileName.(rename= (&ID.=ID));
  array &col._array{*} &colList.;
  do i = 1 to dim(&col._array);
    ID_adj = ID;
    frequent = &col._array{i};
    position = i;
    output;
  end;
  format frequent date9.;
run;

/* get the frequency of each date value */
proc sql noprint;
  create table temp2 as select
      ID_adj,
      frequent,
      count(frequent) as count
      /* count(distinct(frequent)) as num_unique */
          from temp1
          group by ID_adj, frequent;
quit;

proc sort data= temp2(where=(frequent ne .));
  by ID_adj descending count descending frequent;
run;

/* get the most frequent date */
data output.&fileName.(keep= &ID. &col._adj);
  set temp2;
  by ID_adj descending count;
  if first.ID_adj;
  format frequent date9.;
  rename ID_adj=&ID.;
  rename frequent=&col._adj;
run;

%mend checkUnique;

%checkUnique(fileName=test,col=date,ID=id);

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 219 views
  • 1 like
  • 4 in conversation