DATA Step, Macro, Functions and more

compare multiple columns and concat

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

compare multiple columns and concat

[ Edited ]

Hi,

 

I have a sample dataset with the following columns.
To compare all the columns
I want to do comparison with the data step.
Values can be empty or string or numeric

 

data have;
informat id v1-v12 $8.;
input id v1-v12;
cards;
1 1 1 1 3 a b . 4 x . 2 2
2 4 5 h k L L A a 4 6 7 c
3 3 4 5 6 7 3 6 8 8 a w d
4 6 7 2 1 1 2 3 4 5 6 7 s
;

 

There are data for 12 months.I want to compare them in order.

 

cats(ifc(v1=v2,'0','1')
cats(ifc(v2=v3,'0','1')
cats(ifc(v3=v4,'0','1')
cats(ifc(v4=v5,'0','1')
cats(ifc(v5=v6,'0','1')
cats(ifc(v6=v7,'0','1')
cats(ifc(v7=v8,'0','1')
cats(ifc(v8=v9,'0','1')
cats(ifc(v9=v10,'0','1')
cats(ifc(v10=v11,'0','1')
cats(ifc(v11=v12,'0','1')

 

At the end, a concatenated string will be formed.

 

For id = 1 concat_compare='0;0;1;1;1;1;1;1;1;1;0' and

 

same_count=7 /* I have 3 item for 1, 2 item for 2  and 2 item  for missing ,sum(3,2,2) */
not_same_count=5 

 

same_values=1;2; .; /*distinct values, Although it is not the same but more than one / If empty record exists it must be listed  */

 

not_same_values=3;a;b;4;x /*distinct values only have one */

 

I found a code to find the number of the same on the forum.

 

data want (drop=i j);
set have;
array vars $8. v1-v12;
same=0;
do i=1 to dim(vars)-1;
do j=i+1 to dim(vars);
same=sum (same, vars(i) eq vars(j));
end;
end;
run;

But I could not do the others.

I need the following fields on a record-based. 

concat_compare   same_count    not_same_count   same_values    not_same_values

 

Is it possible to do this for each record/id?

 

Best regards


Accepted Solutions
Solution
‎05-04-2017 02:26 AM
Super User
Super User
Posts: 7,401

Re: compare multiple columns and concat

Well, your concat problem is simple and well explained, so code below.  For your other four variables however, I didn't follow at all - please clarify.

data have;
informat id v1-v12 $8.;
input id v1-v12;
cards;
1 1 1 1 3 a b . 4 x . 2 2
2 4 5 h k L L A a 4 6 7 c
3 3 4 5 6 7 3 6 8 8 a w d
4 6 7 2 1 1 2 3 4 5 6 7 s
;
run;

data want;
  set have;
  length concat_compare $2000;
  array vals{*} v:;
  do i=1 to dim(vals)-1;
    concat_compare=catx(';',concat_compare,ifc(vals{i}=vals{i+1},"1","0")); 
  end;
run;

View solution in original post


All Replies
Solution
‎05-04-2017 02:26 AM
Super User
Super User
Posts: 7,401

Re: compare multiple columns and concat

Well, your concat problem is simple and well explained, so code below.  For your other four variables however, I didn't follow at all - please clarify.

data have;
informat id v1-v12 $8.;
input id v1-v12;
cards;
1 1 1 1 3 a b . 4 x . 2 2
2 4 5 h k L L A a 4 6 7 c
3 3 4 5 6 7 3 6 8 8 a w d
4 6 7 2 1 1 2 3 4 5 6 7 s
;
run;

data want;
  set have;
  length concat_compare $2000;
  array vals{*} v:;
  do i=1 to dim(vals)-1;
    concat_compare=catx(';',concat_compare,ifc(vals{i}=vals{i+1},"1","0")); 
  end;
run;
Contributor
Posts: 25

Re: compare multiple columns and concat

[ Edited ]

Hi 

 

thanks for your reply.I will try to explain the other four variables.

 

 mean of  same_values :It is a list of multiple recurring values when we accept each record as an array. so for id=1 ,

The same_values will be "1;2; . "

same_values="1;2; . " (missing is include)

 

mean of not_same_values:A list of values that do not repeat more than once when we accept each record as an array.

so for id=1 , The not_same_values will be "3;a;b;4;x"

 

 not_same_values="3;a;b;4;x"

 

same_count:The number of times that multiple repeating values are repeated. It is the sum of the number of repeats

for example ;

for id = 1 , Repeated 1 to 3 times,Repeated missing value(.) to 2 times  Repeated 2 to 2 times and Their sum is 7 (3+2+2)

same_count=7

 

not_same_count:The number of variables that do not repeat more than once

for id = 1,  Repeated 3 to 1 times,

                 Repeated a to 1 times,

                 Repeated b to 1 times,

                Repeated 4 to 1 times,

                Repeated x to 1 times 

and Their sum is 5 (1+1+1+1+1) 

not_same_count=5

 

I hope,It is more descriptive.

Best regards

Super User
Posts: 10,490

Re: compare multiple columns and concat

for Id 1 V3=1 and v4=3. So wouldn't the result be

0;0;1;1;1;1;1;1;1;1;0

and not your example

0;0;0;1;1;1;1;1;1;1;0

Contributor
Posts: 25

Re: compare multiple columns and concat

Yes, Yes, I made a mistake.Just like you wrote it.

 

0;0;1;1;1;1;1;1;1;1;0

thanks

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 181 views
  • 1 like
  • 3 in conversation