DATA Step, Macro, Functions and more

how to count distinct values and freq in each row

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

how to count distinct values and freq in each row

[ Edited ]

 

Hello ,

Before that, I asked a few similar questions but I did not develop a solution.

I am able to count different values on each row and display them. For example, my data looks similar like:

 

ID Col1 Col2 Col3 Col4 .....

1   10     10    14     14

2    32    33    33     33

3    45    67    .        45

4    13    13    13     13

5    .        .        .     44

 

I woud like to have the count of unique values(missing included) that each ID has, something like:

 

ID     #Uniquevalues  #Unique_count  #Uniquevalues1  #Unique1_count  #Uniquevalues2   #Unique2_count  #.....

1        10;14               2                          10                       2                           14                       2 

2        32;33               2                           32                      1                            33                      3  

3        45;67;.             3                           45                       2                           67                      1                              . 

4        13                    1                           13                       4                            

5       .;44                  2                           .                          3                            44                     1

 

can be done with data step or hash?

 

I appreciate any ideas that may help. Thank you


Accepted Solutions
Solution
‎05-04-2017 02:24 AM
Super User
Super User
Posts: 7,401

Re: how to count distinct values and freq in each row

Yes, you have several of this question here, none of which are closed.  Also, do please post test data in the form of a dataset to get goo answers.  You can do something like this:

data have;
  input ID Col1 $ Col2 $ Col3 $ Col4 $;
datalines;
1   10     10    14     14
2    32    33    33     33
3    45    67    .        45
4    13    13    13     13
5    .        .        .     44
;
run;
data want (drop=curr_val curr_cnt i);
  set have;
  array col{4};
  call sortc(of col{*});
  length uvals ucounts $2000;
  curr_val="--";
  curr_cnt=1;
  ucounts="";
  uvals="";
  do i=1 to dim(col);
    if curr_val ne col{i} then do;
      uvals=catx(';',uvals,col{i});
      if i ne 1 then ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
      curr_val=col{i};
      curr_cnt=1;
    end;
    else curr_cnt=sum(curr_cnt,1);
  end;
  ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
run;

This gives you distinct values and counts.  If you want to split them into columns you can loop over countw() and use scan().

View solution in original post


All Replies
PROC Star
Posts: 551

Re: how to count distinct values and freq in each row

My first question would be: why? Your data looksfine as it is, in my opinion, you would only mess it up.

Super User
Posts: 10,497

Re: how to count distinct values and freq in each row

How many COL variables are there in the data? You will likely need to set the length for a character variable long enough to hold the values plus a potential number number of semicolons. What is the maximum number of digits that any of the COL variables will have? Are they all integer? If any values are not integer what is the number of decimals you want displayed?

Solution
‎05-04-2017 02:24 AM
Super User
Super User
Posts: 7,401

Re: how to count distinct values and freq in each row

Yes, you have several of this question here, none of which are closed.  Also, do please post test data in the form of a dataset to get goo answers.  You can do something like this:

data have;
  input ID Col1 $ Col2 $ Col3 $ Col4 $;
datalines;
1   10     10    14     14
2    32    33    33     33
3    45    67    .        45
4    13    13    13     13
5    .        .        .     44
;
run;
data want (drop=curr_val curr_cnt i);
  set have;
  array col{4};
  call sortc(of col{*});
  length uvals ucounts $2000;
  curr_val="--";
  curr_cnt=1;
  ucounts="";
  uvals="";
  do i=1 to dim(col);
    if curr_val ne col{i} then do;
      uvals=catx(';',uvals,col{i});
      if i ne 1 then ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
      curr_val=col{i};
      curr_cnt=1;
    end;
    else curr_cnt=sum(curr_cnt,1);
  end;
  ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
run;

This gives you distinct values and counts.  If you want to split them into columns you can loop over countw() and use scan().

Contributor
Posts: 25

Re: how to count distinct values and freq in each row

hi @RW9 

Thank you again for your help.

Your code was very useful.But when the numbers in the cells changed, I started to have problems.

When the number of digits of the numbers is growing and all of them are numerical, it does not give the right result.

 

for example;

 

data have;
input ID Col1-Col4;
datalines;
1 1234567890 1234567890 14 14
2 32 33 33 12345678978
3 45 67 . 45
4 1234567898 13 1234567898 13
5 . . . 44
;
run;

 

The output is visible below.

Untitled.jpg

Values are repeating.

 

If you can help me very much.

Super User
Super User
Posts: 7,401

Re: how to count distinct values and freq in each row

Sorry, that picture doesn't match the data we were talking about, that is numeric data.  In this example, it is likely that COL3 <> COL4, there can be a tiny fraction associated with one or the other that might be causing this.  Or it might be the conversion to character.  Make sure to round, ciel, floor, int the number to get the actual number e.g.:

int(col3)=int(col4);=

Contributor
Posts: 25

Re: how to count distinct values and freq in each row

thanks for your reply.

This is the result when I run the code. Because we do a sort(In the code you wrote: call sortc(of col{*}); ) in the code.

 

I solved my problem with your help.

 

I made some changes in the code you wrote

1. I changed the sort function because the data is numeric;   call sortc(of col{*}) --> call sortn(of col{*})

2. I changed the control function because the data are numericcurr_val="--" --> curr_val=1 

    When I changed this part, the condition(if curr_val ne col{i} then do;)  started to work correctly

 

Thank you very much for your interest.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 227 views
  • 1 like
  • 4 in conversation