Solved: how to count distinct values and freq in each row

erdem_ustun · Posted 05-03-2017 11:46 AM

Hello ,

Before that, I asked a few similar questions but I did not develop a solution.

I am able to count different values on each row and display them. For example, my data looks similar like:

ID Col1 Col2 Col3 Col4 .....

1 10 10 14 14

2 32 33 33 33

3 45 67 . 45

4 13 13 13 13

5 . . . 44

I woud like to have the count of unique values(missing included) that each ID has, something like:

ID #Uniquevalues #Unique_count #Uniquevalues1 #Unique1_count #Uniquevalues2 #Unique2_count #.....

1 10;14 2 10 2 14 2

2 32;33 2 32 1 33 3

3 45;67;. 3 45 2 67 1 .

4 13 1 13 4

5 .;44 2 . 3 44 1

can be done with data step or hash?

I appreciate any ideas that may help. Thank you

RW9 · Posted 05-03-2017 12:46 PM

Yes, you have several of this question here, none of which are closed. Also, do please post test data in the form of a dataset to get goo answers. You can do something like this:

data have;
  input ID Col1 $ Col2 $ Col3 $ Col4 $;
datalines;
1   10     10    14     14
2    32    33    33     33
3    45    67    .        45
4    13    13    13     13
5    .        .        .     44
;
run;
data want (drop=curr_val curr_cnt i);
  set have;
  array col{4};
  call sortc(of col{*});
  length uvals ucounts $2000;
  curr_val="--";
  curr_cnt=1;
  ucounts="";
  uvals="";
  do i=1 to dim(col);
    if curr_val ne col{i} then do;
      uvals=catx(';',uvals,col{i});
      if i ne 1 then ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
      curr_val=col{i};
      curr_cnt=1;
    end;
    else curr_cnt=sum(curr_cnt,1);
  end;
  ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
run;

This gives you distinct values and counts. If you want to split them into columns you can loop over countw() and use scan().

View solution in original post

PeterClemmensen · Posted 05-03-2017 11:59 AM

My first question would be: why? Your data looksfine as it is, in my opinion, you would only mess it up.

The DATA to DATA Step Macro
Blog: SASnrd

ballardw · Posted 05-03-2017 12:40 PM

How many COL variables are there in the data? You will likely need to set the length for a character variable long enough to hold the values plus a potential number number of semicolons. What is the maximum number of digits that any of the COL variables will have? Are they all integer? If any values are not integer what is the number of decimals you want displayed?

RW9 · Posted 05-03-2017 12:46 PM

Yes, you have several of this question here, none of which are closed. Also, do please post test data in the form of a dataset to get goo answers. You can do something like this:

data have;
  input ID Col1 $ Col2 $ Col3 $ Col4 $;
datalines;
1   10     10    14     14
2    32    33    33     33
3    45    67    .        45
4    13    13    13     13
5    .        .        .     44
;
run;
data want (drop=curr_val curr_cnt i);
  set have;
  array col{4};
  call sortc(of col{*});
  length uvals ucounts $2000;
  curr_val="--";
  curr_cnt=1;
  ucounts="";
  uvals="";
  do i=1 to dim(col);
    if curr_val ne col{i} then do;
      uvals=catx(';',uvals,col{i});
      if i ne 1 then ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
      curr_val=col{i};
      curr_cnt=1;
    end;
    else curr_cnt=sum(curr_cnt,1);
  end;
  ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
run;

This gives you distinct values and counts. If you want to split them into columns you can loop over countw() and use scan().

erdem_ustun · Posted 06-20-2017 11:22 AM

hi @RW9

Thank you again for your help.

Your code was very useful.But when the numbers in the cells changed, I started to have problems.

When the number of digits of the numbers is growing and all of them are numerical, it does not give the right result.

for example;

data have;
input ID Col1-Col4;
datalines;
1 1234567890 1234567890 14 14
2 32 33 33 12345678978
3 45 67 . 45
4 1234567898 13 1234567898 13
5 . . . 44
;
run;

The output is visible below.

Values are repeating.

If you can help me very much.

RW9 · Posted 06-21-2017 03:47 AM

Sorry, that picture doesn't match the data we were talking about, that is numeric data. In this example, it is likely that COL3 <> COL4, there can be a tiny fraction associated with one or the other that might be causing this. Or it might be the conversion to character. Make sure to round, ciel, floor, int the number to get the actual number e.g.:

int(col3)=int(col4);=

erdem_ustun · Posted 06-21-2017 09:47 AM

thanks for your reply.

This is the result when I run the code. Because we do a sort(In the code you wrote: call sortc(of col{*}); ) in the code.

I solved my problem with your help.

I made some changes in the code you wrote

1. I changed the sort function because the data is numeric; call sortc(of col{*}) --> call sortn(of col{*})

2. I changed the control function because the data are numeric; curr_val="--" --> curr_val=1

When I changed this part, the condition(if curr_val ne col{i} then do;) started to work correctly

Thank you very much for your interest.

how to count distinct values and freq in each row

Re: how to count distinct values and freq in each row

Re: how to count distinct values and freq in each row

Re: how to count distinct values and freq in each row

Re: how to count distinct values and freq in each row

Re: how to count distinct values and freq in each row

Re: how to count distinct values and freq in each row

Re: how to count distinct values and freq in each row

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away