BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
erdem_ustun
Obsidian | Level 7

 

Hello ,

Before that, I asked a few similar questions but I did not develop a solution.

I am able to count different values on each row and display them. For example, my data looks similar like:

 

ID Col1 Col2 Col3 Col4 .....

1   10     10    14     14

2    32    33    33     33

3    45    67    .        45

4    13    13    13     13

5    .        .        .     44

 

I woud like to have the count of unique values(missing included) that each ID has, something like:

 

ID     #Uniquevalues  #Unique_count  #Uniquevalues1  #Unique1_count  #Uniquevalues2   #Unique2_count  #.....

1        10;14               2                          10                       2                           14                       2 

2        32;33               2                           32                      1                            33                      3  

3        45;67;.             3                           45                       2                           67                      1                              . 

4        13                    1                           13                       4                            

5       .;44                  2                           .                          3                            44                     1

 

can be done with data step or hash?

 

I appreciate any ideas that may help. Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, you have several of this question here, none of which are closed.  Also, do please post test data in the form of a dataset to get goo answers.  You can do something like this:

data have;
  input ID Col1 $ Col2 $ Col3 $ Col4 $;
datalines;
1   10     10    14     14
2    32    33    33     33
3    45    67    .        45
4    13    13    13     13
5    .        .        .     44
;
run;
data want (drop=curr_val curr_cnt i);
  set have;
  array col{4};
  call sortc(of col{*});
  length uvals ucounts $2000;
  curr_val="--";
  curr_cnt=1;
  ucounts="";
  uvals="";
  do i=1 to dim(col);
    if curr_val ne col{i} then do;
      uvals=catx(';',uvals,col{i});
      if i ne 1 then ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
      curr_val=col{i};
      curr_cnt=1;
    end;
    else curr_cnt=sum(curr_cnt,1);
  end;
  ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
run;

This gives you distinct values and counts.  If you want to split them into columns you can loop over countw() and use scan().

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

My first question would be: why? Your data looksfine as it is, in my opinion, you would only mess it up.

ballardw
Super User

How many COL variables are there in the data? You will likely need to set the length for a character variable long enough to hold the values plus a potential number number of semicolons. What is the maximum number of digits that any of the COL variables will have? Are they all integer? If any values are not integer what is the number of decimals you want displayed?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, you have several of this question here, none of which are closed.  Also, do please post test data in the form of a dataset to get goo answers.  You can do something like this:

data have;
  input ID Col1 $ Col2 $ Col3 $ Col4 $;
datalines;
1   10     10    14     14
2    32    33    33     33
3    45    67    .        45
4    13    13    13     13
5    .        .        .     44
;
run;
data want (drop=curr_val curr_cnt i);
  set have;
  array col{4};
  call sortc(of col{*});
  length uvals ucounts $2000;
  curr_val="--";
  curr_cnt=1;
  ucounts="";
  uvals="";
  do i=1 to dim(col);
    if curr_val ne col{i} then do;
      uvals=catx(';',uvals,col{i});
      if i ne 1 then ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
      curr_val=col{i};
      curr_cnt=1;
    end;
    else curr_cnt=sum(curr_cnt,1);
  end;
  ucounts=catx(';',ucounts,strip(put(curr_cnt,best.)));
run;

This gives you distinct values and counts.  If you want to split them into columns you can loop over countw() and use scan().

erdem_ustun
Obsidian | Level 7

hi @RW9 

Thank you again for your help.

Your code was very useful.But when the numbers in the cells changed, I started to have problems.

When the number of digits of the numbers is growing and all of them are numerical, it does not give the right result.

 

for example;

 

data have;
input ID Col1-Col4;
datalines;
1 1234567890 1234567890 14 14
2 32 33 33 12345678978
3 45 67 . 45
4 1234567898 13 1234567898 13
5 . . . 44
;
run;

 

The output is visible below.

Untitled.jpg

Values are repeating.

 

If you can help me very much.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, that picture doesn't match the data we were talking about, that is numeric data.  In this example, it is likely that COL3 <> COL4, there can be a tiny fraction associated with one or the other that might be causing this.  Or it might be the conversion to character.  Make sure to round, ciel, floor, int the number to get the actual number e.g.:

int(col3)=int(col4);=

erdem_ustun
Obsidian | Level 7

thanks for your reply.

This is the result when I run the code. Because we do a sort(In the code you wrote: call sortc(of col{*}); ) in the code.

 

I solved my problem with your help.

 

I made some changes in the code you wrote

1. I changed the sort function because the data is numeric;   call sortc(of col{*}) --> call sortn(of col{*})

2. I changed the control function because the data are numericcurr_val="--" --> curr_val=1 

    When I changed this part, the condition(if curr_val ne col{i} then do;)  started to work correctly

 

Thank you very much for your interest.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1566 views
  • 1 like
  • 4 in conversation