DATA Step, Macro, Functions and more

checking values within row

Reply
Contributor
Posts: 71

checking values within row

Dear Consultants,

For the first time I used data step for creating dummy dataset(as I always get data in CSV files and never got a chance to use input statements), I hope now nobody will get anxious. Here is the dataset.

 

data dummy;
infile datalines missover;
input ID C1$ C2$ C3$ C4$ C5$ C6$ C7$ C8$ C9$ C10$ S1$ S2$ S3$ S4$ S5$;
datalines;
1234 a b . . . . . . . . b b d .
2235 b d a . . . . . . . a d a s
2568 a s d f g h i . . . x y z
2698 x x x . . . . . . . x x y z
5545 a x d f f f f f f f f s d f
5625 f k l a f . . . . . y t . . .
;
run;

 

Now from this dataset I want only those records in which values in columns C (C1 to C10) are repeated in columns S (S1 to S5), something like below.

1234  a b . . . . . . . . b b d .
2235  b d a . . . . . . . a d a s
2698 x x x . . . . . . . x x y z
5545 a x d f f f f f f f f s d f

Waiting for the expert advise.

Trusted Advisor
Posts: 1,553

Re: checking values within row

[ Edited ]

Next code will do what you want:

data dummy;
infile datalines missover;
input ID C1$ C2$ C3$ C4$ C5$ C6$ C7$ C8$ C9$ C10$ S1$ S2$ S3$ S4$ S5$;
datalines;
1234 a b . . . . . . . . b b d .
2235 b d a . . . . . . . a d a s
2568 a s d f g h i . . . x y z
2698 x x x . . . . . . . x x y z
5545 a x d f f f f f f f f s d f
5625 f k l a f . . . . . y t . . .
;
run;

data want;
 set dummy;
   array cx $ c1-c10;
   array sx $ s1-s10;
   flag=0;
   do i=1 to dim(cx);
      if cx(i) not in (' ','.') and
whichc(cx(i),s1,s2,s3,s4,s5) then flag=1; end; if flag; run;
Trusted Advisor
Posts: 1,018

Re: checking values within row

@Shmuel

 

Minor efficiency comment.  Consider changing

 

  do i=1 to dim(cx);
to

  do i=1 to dim(cx) while (flag^=1);

Trusted Advisor
Posts: 1,553

Re: checking values within row

Thanks to @mkeintz and @Peter_C the ext code would be more efficient:

data want;
 set dummy;
   array cx $ c1-c10;
   array sx $ s1-s10;
   flag=0;
   do i=1 to dim(cx) while flag = 0;
      if not missing(cx(i))  and       
         whichc(cx(i),s1,s2,s3,s4,s5) then flag=1;
   end;
   if flag;
run;
Contributor
Posts: 71

Re: checking values within row

Thanks to @Shmuel
This program is giving desired solution with just a minor syntax error. flag = 0 → (flag=0)
Valued Guide
Posts: 2,177

Re: checking values within row

only minor comment.....
The effect of the input statement would treat a value '.' as missing and replace it with blank. So, the IF CX(i) test needs check only for blank.
Only informat $CHAR returns the dots as a value.


Super User
Posts: 10,018

Re: checking values within row

You want that character appeared at least twice in s1-s5 ?

data dummy;
infile datalines missover;
input ID C1$ C2$ C3$ C4$ C5$ C6$ C7$ C8$ C9$ C10$ S1$ S2$ S3$ S4$ S5$;
datalines;
1234 a b . . . . . . . . b b d .
2235 b d a . . . . . . . a d a s
2568 a s d f g h i . . . x y z
2698 x x x . . . . . . . x x y z
5545 a x d f f f f f f f f s d f
5625 f k l a f . . . . . y t . . .
;
run;

data want;
 set dummy;
   array cx{*} $ c1-c10;
   array sx{*} $ s1-s10;
   do i=1 to dim(cx);
    n=countc(cats(of sx{*}),cx{i});
    if n gt 1 then do;output;leave;end;
   end;
drop i n;
run;

 

 

Ask a Question
Discussion stats
  • 6 replies
  • 159 views
  • 0 likes
  • 5 in conversation