Solved: Most frequent observation by row & number of distinct values by row

ayin · Posted 06-04-2017 08:23 PM

data have;
  input id date1:date9. date2:date9. date3:date9.;
  format date1 date9. date2 date9. date3 date9.;
  cards;
1 01JUN2001 01JUN2001 01JUN2001
2 01JUN2001 01JUN2001 05MAY1990
3 05MAY1990 05MAY1990 01JUN2001
4 01JUN2001 05MAY1990 .
;
run;

data want;
  input id date1:date9. date2:date9. date3:date9. date_adj:date9. date_count:best8.;
  format date1 date9. date2 date9. date3 date9. date_adj date9. date_count best8.;
  cards;
1 01JUN2001 01JUN2001 01JUN2001 01JUN2001 1
2 01JUN2001 01JUN2001 05MAY1990 01JUN2001 2
3 05MAY1990 05MAY1990 01JUN2001 05MAY1990 2
4 01JUN2001 05MAY1990 . . 2
;
run;

From data have to data want:

Create a new column named 'date_adj', it would be the most frequent observation in that row; if same frequency, then output blank, or first/last appeared non-blank value (not really fussy about this output).
Create a new column named 'date_count', count the number of distinct values by that row.

I understand SAS can do this by column, not sure how to achieve it by rows.

Thanks.

ayin · Posted 06-05-2017 09:19 PM

The following codes proved to be working:

%macro checkUnique(fileName=,col=,ID=);
%global colList;

/* get all date columns' names */
proc sql noprint;
select name into: colList separated by ' '
  from dictionary.columns
  where libname = 'WORK' and memname contains upcase("&fileName") and name contains upcase("&col");
quit;

/* transpose back to flat table */
data temp1(keep= ID_adj frequent position);
  set work.&fileName.(rename= (&ID.=ID));
  array &col._array{*} &colList.;
  do i = 1 to dim(&col._array);
    ID_adj = ID;
    frequent = &col._array{i};
    position = i;
    output;
  end;
  format frequent date9.;
run;

/* get the frequency of each date value */
proc sql noprint;
  create table temp2 as select
      ID_adj,
      frequent,
      count(frequent) as count
      /* count(distinct(frequent)) as num_unique */
          from temp1
          group by ID_adj, frequent;
quit;

proc sort data= temp2(where=(frequent ne .));
  by ID_adj descending count descending frequent;
run;

/* get the most frequent date */
data output.&fileName.(keep= &ID. &col._adj);
  set temp2;
  by ID_adj descending count;
  if first.ID_adj;
  format frequent date9.;
  rename ID_adj=&ID.;
  rename frequent=&col._adj;
run;

%mend checkUnique;

%checkUnique(fileName=test,col=date,ID=id);

View solution in original post

Reeza · Posted 06-04-2017 08:38 PM

Transpose to a row structure via PROC TRANSPOSE, obtain solution, and merge it back into main dataset.

kiranv_ · Posted 06-04-2017 09:01 PM

By using case statements but we careful in case statement if below query has missed any scenario. But you get idea by seeing this query.

proc sql;
create table want as
select date1, date2, date3,
 case when date1=date2 and date1 =date3 then date1
      when date1 =date2 and date1<>date3 then date1
      when date1 =date3 and date1 <>date2 then date1
      when date2= date3 and date1<>date3 then date2
      when date1<>date2 and date2<>date3 then .
 end as date_adj format=date9.,
 case when date1=date2 and date1 =date3 then 1
      when date1 =date2 and date1<>date3 then 2
      when date1 =date3 and date1 <>date2 then 2
       when date2= date3 and date1<>date3 then 2
      when date1<>date2 and date2<>date3 then 2
 end as date_count
 from have ;
 quit;

ayin · Posted 06-05-2017 08:43 PM

Hi @kiranv_,

Thanks for your approach. Because in my real data I have 8 columns, it might be too hard to build the logic by using 'Case'.

kiranv_ · Posted 06-05-2017 08:45 PM

Yes. @Ksharp approach is best to follow

Ksharp · Posted 06-05-2017 09:56 AM

How about IML code ?

data have;
  input id date1:date9. date2:date9. date3:date9.;
  format date1 date9. date2 date9. date3 date9.;
  cards;
1 01JUN2001 01JUN2001 01JUN2001
2 01JUN2001 01JUN2001 05MAY1990
3 05MAY1990 05MAY1990 01JUN2001
4 01JUN2001 05MAY1990 .
;
run;
proc iml;
use have(keep=date:) nobs nobs;
read all var _all_ into x;
close;
date_adj=j(nobs,1,.);
date_count=j(nobs,1,.);
do i=1 to nobs;
 temp=x[i,];
 call tabulate(level,freq,temp);
 if sum(freq=max(freq)) ^= 1 then date_adj[i]=.;
  else date_adj[i]=level[ freq[<:>] ];
 date_count[i]=ncol(level);
end;
create temp var{date_adj date_count};
append;
close;
run;

data want;
 merge have temp;
 format date_adj date9.;
run;

ayin · Posted 06-05-2017 08:44 PM

Thanks @Ksharp, I believe your codes can help a lot of people tackle similar problems. We don't have IML however.

ayin · Posted 06-05-2017 09:19 PM

The following codes proved to be working:

%macro checkUnique(fileName=,col=,ID=);
%global colList;

/* get all date columns' names */
proc sql noprint;
select name into: colList separated by ' '
  from dictionary.columns
  where libname = 'WORK' and memname contains upcase("&fileName") and name contains upcase("&col");
quit;

/* transpose back to flat table */
data temp1(keep= ID_adj frequent position);
  set work.&fileName.(rename= (&ID.=ID));
  array &col._array{*} &colList.;
  do i = 1 to dim(&col._array);
    ID_adj = ID;
    frequent = &col._array{i};
    position = i;
    output;
  end;
  format frequent date9.;
run;

/* get the frequency of each date value */
proc sql noprint;
  create table temp2 as select
      ID_adj,
      frequent,
      count(frequent) as count
      /* count(distinct(frequent)) as num_unique */
          from temp1
          group by ID_adj, frequent;
quit;

proc sort data= temp2(where=(frequent ne .));
  by ID_adj descending count descending frequent;
run;

/* get the most frequent date */
data output.&fileName.(keep= &ID. &col._adj);
  set temp2;
  by ID_adj descending count;
  if first.ID_adj;
  format frequent date9.;
  rename ID_adj=&ID.;
  rename frequent=&col._adj;
run;

%mend checkUnique;

%checkUnique(fileName=test,col=date,ID=id);

Most frequent observation by row & number of distinct values by row

Re: Most frequent observation by row & number of distinct values by row

Re: Most frequent observation by row & number of distinct values by row

Re: Most frequent observation by row & number of distinct values by row

Re: Most frequent observation by row & number of distinct values by row

Re: Most frequent observation by row & number of distinct values by row

Re: Most frequent observation by row & number of distinct values by row

Re: Most frequent observation by row & number of distinct values by row

Re: Most frequent observation by row & number of distinct values by row

Registration is open

SAS Training: Just a Click Away