Hello ,
Before that, I asked a few similar questions but I did not develop a solution.
I am able to count different values on each row and display them. For example, my data looks similar like:
ID Col1 Col2 Col3 Col4 .....
1 10 10 14 14
2 32 33 33 33
3 45 67 . 45
4 13 13 13 13
5 . . . 44
I woud like to have the count of unique values(missing included) that each ID has, something like:
ID #Uniquevalues #Unique_count #Uniquevalues1 #Unique1_count #Uniquevalues2 #Unique2_count #.....
1 10;14 2 10 2 14 2
2 32;33 2 32 1 33 3
3 45;67;. 3 45 2 67 1 .
4 13 1 13 4
5 .;44 2 . 3 44 1
can be done with data step or hash?
I appreciate any ideas that may help. Thank you
Yes, you have several of this question here, none of which are closed. Also, do please post test data in the form of a dataset to get goo answers. You can do something like this:
data have; input ID Col1 $ Col2 $ Col3 $ Col4 $; datalines; 1 10 10 14 14 2 32 33 33 33 3 45 67 . 45 4 13 13 13 13 5 . . . 44 ; run; data want (drop=curr_val curr_cnt i); set have; array col{4}; call sortc(of col{*}); length uvals ucounts $2000; curr_val="--"; curr_cnt=1; ucounts=""; uvals=""; do i=1 to dim(col); if curr_val ne col{i} then do; uvals=catx(';',uvals,col{i}); if i ne 1 then ucounts=catx(';',ucounts,strip(put(curr_cnt,best.))); curr_val=col{i}; curr_cnt=1; end; else curr_cnt=sum(curr_cnt,1); end; ucounts=catx(';',ucounts,strip(put(curr_cnt,best.))); run;
This gives you distinct values and counts. If you want to split them into columns you can loop over countw() and use scan().
My first question would be: why? Your data looksfine as it is, in my opinion, you would only mess it up.
How many COL variables are there in the data? You will likely need to set the length for a character variable long enough to hold the values plus a potential number number of semicolons. What is the maximum number of digits that any of the COL variables will have? Are they all integer? If any values are not integer what is the number of decimals you want displayed?
Yes, you have several of this question here, none of which are closed. Also, do please post test data in the form of a dataset to get goo answers. You can do something like this:
data have; input ID Col1 $ Col2 $ Col3 $ Col4 $; datalines; 1 10 10 14 14 2 32 33 33 33 3 45 67 . 45 4 13 13 13 13 5 . . . 44 ; run; data want (drop=curr_val curr_cnt i); set have; array col{4}; call sortc(of col{*}); length uvals ucounts $2000; curr_val="--"; curr_cnt=1; ucounts=""; uvals=""; do i=1 to dim(col); if curr_val ne col{i} then do; uvals=catx(';',uvals,col{i}); if i ne 1 then ucounts=catx(';',ucounts,strip(put(curr_cnt,best.))); curr_val=col{i}; curr_cnt=1; end; else curr_cnt=sum(curr_cnt,1); end; ucounts=catx(';',ucounts,strip(put(curr_cnt,best.))); run;
This gives you distinct values and counts. If you want to split them into columns you can loop over countw() and use scan().
hi @RW9
Thank you again for your help.
Your code was very useful.But when the numbers in the cells changed, I started to have problems.
When the number of digits of the numbers is growing and all of them are numerical, it does not give the right result.
for example;
data have;
input ID Col1-Col4;
datalines;
1 1234567890 1234567890 14 14
2 32 33 33 12345678978
3 45 67 . 45
4 1234567898 13 1234567898 13
5 . . . 44
;
run;
The output is visible below.
Values are repeating.
If you can help me very much.
Sorry, that picture doesn't match the data we were talking about, that is numeric data. In this example, it is likely that COL3 <> COL4, there can be a tiny fraction associated with one or the other that might be causing this. Or it might be the conversion to character. Make sure to round, ciel, floor, int the number to get the actual number e.g.:
int(col3)=int(col4);=
thanks for your reply.
This is the result when I run the code. Because we do a sort(In the code you wrote: call sortc(of col{*}); ) in the code.
I solved my problem with your help.
I made some changes in the code you wrote
1. I changed the sort function because the data is numeric; call sortc(of col{*}) --> call sortn(of col{*})
2. I changed the control function because the data are numeric; curr_val="--" --> curr_val=1
When I changed this part, the condition(if curr_val ne col{i} then do;) started to work correctly
Thank you very much for your interest.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.