DATA Step, Macro, Functions and more

The frequencies of the array values

Accepted Solution Solved
Reply
Contributor
Posts: 27
Accepted Solution

The frequencies of the array values

Hi,

 

I have written a problem before.(compare multiple columns and concat )

 

Thank you very much rw9 for finding a solution .

data have;
informat id v1-v12 $8.;
input id v1-v12;
cards;
1 1 1 1 3 a b . 4 x . 2 2
2 4 5 h k L L A a 4 6 7 c
3 3 4 5 6 7 3 6 8 8 a w d
4 6 7 2 1 1 2 3 4 5 6 7 s
;
run;

data want;
  set have;
  length concat_compare $2000;
  array vals{*} v:;
  do i=1 to dim(vals)-1;
    concat_compare=catx(';',concat_compare,ifc(vals{i}=vals{i+1},"0","1")); 
  end;
run;

rw9 

But I could not do the other things I wanted to do.

 

Is it possible to find the frequency of the values in array?

Is it possible to find the distinct of values in array?

I would ask your assistance in this matter.

The output I want to create is down.

Untitled.jpg

 

Best regards


Accepted Solutions
Solution
‎05-04-2017 02:27 AM
Super User
Super User
Posts: 7,997

Re: The frequencies of the array values

[ Edited ]
Posted in reply to erdem_ustun

What you will likely find is that changing your data structure will be the simplest method.  This shows how to get unique values:

data have;
  informat id v1-v12 $8.;
  input id v1-v12;
  /* Add a row identifier */
  id=_n_;
cards;
1 1 1 1 3 a b . 4 x . 2 2
2 4 5 h k L L A a 4 6 7 c
3 3 4 5 6 7 3 6 8 8 a w d
4 6 7 2 1 1 2 3 4 5 6 7 s
;
run;
proc transpose data=have out=inter;
  by id;
  var v:;
run;
proc sort data=inter out=unique_values;
  by id col1;
run;
data unique_values;
  set unique_values;
  length res $200;
  retain res;
  by id;
  res=ifc(first.id,col1,catx(',',res,col1));
  if last.id then output;
run;

With the inter dataset you can select counts of each and so on, then merge that data back to the one you already have.  In fact, if you look at it once you have done that you will likely find all the information can be done in one pass of a datastep, but start by doing each bit separate. 

 

Oh, and mark a post as correct if you have your answer (from the other post!).

View solution in original post


All Replies
Solution
‎05-04-2017 02:27 AM
Super User
Super User
Posts: 7,997

Re: The frequencies of the array values

[ Edited ]
Posted in reply to erdem_ustun

What you will likely find is that changing your data structure will be the simplest method.  This shows how to get unique values:

data have;
  informat id v1-v12 $8.;
  input id v1-v12;
  /* Add a row identifier */
  id=_n_;
cards;
1 1 1 1 3 a b . 4 x . 2 2
2 4 5 h k L L A a 4 6 7 c
3 3 4 5 6 7 3 6 8 8 a w d
4 6 7 2 1 1 2 3 4 5 6 7 s
;
run;
proc transpose data=have out=inter;
  by id;
  var v:;
run;
proc sort data=inter out=unique_values;
  by id col1;
run;
data unique_values;
  set unique_values;
  length res $200;
  retain res;
  by id;
  res=ifc(first.id,col1,catx(',',res,col1));
  if last.id then output;
run;

With the inter dataset you can select counts of each and so on, then merge that data back to the one you already have.  In fact, if you look at it once you have done that you will likely find all the information can be done in one pass of a datastep, but start by doing each bit separate. 

 

Oh, and mark a post as correct if you have your answer (from the other post!).

Contributor
Posts: 27

Re: The frequencies of the array values

thanks for your reply.

 

Result when I run the code you wrote for unique_values

new_code.jpg

 

But the necessary values are as follows

required.jpg

 

best regards.

 

Super User
Super User
Posts: 7,997

Re: The frequencies of the array values

Posted in reply to erdem_ustun

Yep, sorry, missed nodupkey:

proc sort data=inter out=unique_values nodupkey;
  by id col1;
run;

 

Contributor
Posts: 27

Re: The frequencies of the array values

Do we have a chance to find repeat records and repeat times? like as

freq_dup.jpg

best regards..

Super User
Posts: 11,343

Re: The frequencies of the array values

Posted in reply to erdem_ustun

Please display results in a form that is actually readable. The fonts in the image don't render well (at least on my monitors)

Contributor
Posts: 27

Re: The frequencies of the array values

Hi,

I'm copying it down.I also add it as excel.

 

id v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 concat_compare same_variables Distinct_Values Cnt_of_Distinct_Values dist_1 cnt_of_1 dist_2 cnt_of_2 dist_3 cnt_of_3 dist_4 cnt_of_4 dist_5 cnt_of_5
1 1 1 1 3 a b   4     2 2 0;0;1;1;1;1;1;1;0;1;0 v1_v2;v2_v3;v9_v10;v11_v12 1;3;a;b;4;2;missing 6 1 3 2 2 missing 2 3;a;b;4 1    
2 4 5 h k L L A a 4 6 7 c 1;1;1;1;0;1;1;1;1;1;1 v5_v6 4;5;h;k;L;A;a;6;7;c 10 4 2 L 2 5;h;k;A;a;6;7;c 1        
3 3 4 5 6 7 3 6 8 8 a w d 1;1;1;1;1;1;1;0;1;1;1 v8_v9 3;4;5;6;7;8;a;w;d 9 3 2 6 2 8 2 4;5;7;a;w;d 1    
4 6 7 2 1 1 2 3 4 5 6 7 s 1;1;1;0;1;1;1;1;1;1;1 v4_v5 6;7;2;1;3;4;5;s 8 1 2 2 2 6 2 7 2 3;4;5;s 1
Contributor
Posts: 27

Re: The frequencies of the array values

[ Edited ]

 

I could not find any solution to my problem.I found a code that was written in this forum before.But it only calculates one the maximum value.It does not calculate the frequencies of the others. Does not calculate if there is more than one maximum.

I copy down to give you an idea.

 

data have;
input id 1 x1 $ 4-5 x2 $ 7-8 x3 $ 10-11 x4 $ 13-14 x5 $ 16-17;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 01 01 02 02
5 01
;
run;


data want;
set have;
length MostFreq $2;
array x x:;
array _t[10] _temporary_;
call missing(of _t[*]);
do _n_=1 to dim(x);
if x[_n_] ne ' ' then _t[input(x[_n_],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=put(whichn(Count, of _t[*]),z2.);
run;

 

How can I get the output of my example as below..

I ask for your help

 

data have;
  informat id v1-v12 $8.;
  input id v1-v12;
  /* Add a row identifier */
  id=_n_;
cards;
1 1 1 1 3 a b . 4 x . 2 2
2 4 5 h k L L A a 4 6 7 c
3 3 4 5 6 7 3 6 8 8 a w d
4 6 7 2 1 1 2 3 4 5 6 7 s
;
run;

 

id v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 concat_compare same_variables Distinct_Values Cnt_of_Distinct_Values dist_1 cnt_of_1 dist_2 cnt_of_2 dist_3 cnt_of_3 dist_4 cnt_of_4 dist_5 cnt_of_5
1 1 1 1 3 a b   4     2 2 0;0;1;1;1;1;1;1;0;1;0 v1_v2;v2_v3;v9_v10;v11_v12 1;3;a;b;4;2;missing 7 1 3 2 2 missing 2 3;a;b;4 1    
2 4 5 h k L L A a 4 6 7 c 1;1;1;1;0;1;1;1;1;1;1 v5_v6 4;5;h;k;L;A;a;6;7;c 10 4 2 L 2 5;h;k;A;a;6;7;c 1        
3 3 4 5 6 7 3 6 8 8 a w d 1;1;1;1;1;1;1;0;1;1;1 v8_v9 3;4;5;6;7;8;a;w;d 9 3 2 6 2 8 2 4;5;7;a;w;d 1    
4 6 7 2 1 1 2 3 4 5 6 7 s 1;1;1;0;1;1;1;1;1;1;1 v4_v5 6;7;2;1;3;4;5;s 8 1 2 2 2 6 2 7 2 3;4;5;s 1

best regards

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 215 views
  • 0 likes
  • 3 in conversation