BookmarkSubscribeRSS Feed
deega
Quartz | Level 8

Dear Consultants,

For the first time I used data step for creating dummy dataset(as I always get data in CSV files and never got a chance to use input statements), I hope now nobody will get anxious. Here is the dataset.

 

data dummy;
infile datalines missover;
input ID C1$ C2$ C3$ C4$ C5$ C6$ C7$ C8$ C9$ C10$ S1$ S2$ S3$ S4$ S5$;
datalines;
1234 a b . . . . . . . . b b d .
2235 b d a . . . . . . . a d a s
2568 a s d f g h i . . . x y z
2698 x x x . . . . . . . x x y z
5545 a x d f f f f f f f f s d f
5625 f k l a f . . . . . y t . . .
;
run;

 

Now from this dataset I want only those records in which values in columns C (C1 to C10) are repeated in columns S (S1 to S5), something like below.

1234  a b . . . . . . . . b b d .
2235  b d a . . . . . . . a d a s
2698 x x x . . . . . . . x x y z
5545 a x d f f f f f f f f s d f

Waiting for the expert advise.

6 REPLIES 6
Shmuel
Garnet | Level 18

Next code will do what you want:

data dummy;
infile datalines missover;
input ID C1$ C2$ C3$ C4$ C5$ C6$ C7$ C8$ C9$ C10$ S1$ S2$ S3$ S4$ S5$;
datalines;
1234 a b . . . . . . . . b b d .
2235 b d a . . . . . . . a d a s
2568 a s d f g h i . . . x y z
2698 x x x . . . . . . . x x y z
5545 a x d f f f f f f f f s d f
5625 f k l a f . . . . . y t . . .
;
run;

data want;
 set dummy;
   array cx $ c1-c10;
   array sx $ s1-s10;
   flag=0;
   do i=1 to dim(cx);
      if cx(i) not in (' ','.') and
whichc(cx(i),s1,s2,s3,s4,s5) then flag=1; end; if flag; run;
mkeintz
PROC Star

@Shmuel

 

Minor efficiency comment.  Consider changing

 

  do i=1 to dim(cx);
to

  do i=1 to dim(cx) while (flag^=1);

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Shmuel
Garnet | Level 18

Thanks to @mkeintz and @Peter_C the ext code would be more efficient:

data want;
 set dummy;
   array cx $ c1-c10;
   array sx $ s1-s10;
   flag=0;
   do i=1 to dim(cx) while flag = 0;
      if not missing(cx(i))  and       
         whichc(cx(i),s1,s2,s3,s4,s5) then flag=1;
   end;
   if flag;
run;
deega
Quartz | Level 8
Thanks to @Shmuel
This program is giving desired solution with just a minor syntax error. flag = 0 → (flag=0)
Peter_C
Rhodochrosite | Level 12
only minor comment.....
The effect of the input statement would treat a value '.' as missing and replace it with blank. So, the IF CX(i) test needs check only for blank.
Only informat $CHAR returns the dots as a value.


Ksharp
Super User

You want that character appeared at least twice in s1-s5 ?

data dummy;
infile datalines missover;
input ID C1$ C2$ C3$ C4$ C5$ C6$ C7$ C8$ C9$ C10$ S1$ S2$ S3$ S4$ S5$;
datalines;
1234 a b . . . . . . . . b b d .
2235 b d a . . . . . . . a d a s
2568 a s d f g h i . . . x y z
2698 x x x . . . . . . . x x y z
5545 a x d f f f f f f f f s d f
5625 f k l a f . . . . . y t . . .
;
run;

data want;
 set dummy;
   array cx{*} $ c1-c10;
   array sx{*} $ s1-s10;
   do i=1 to dim(cx);
    n=countc(cats(of sx{*}),cx{i});
    if n gt 1 then do;output;leave;end;
   end;
drop i n;
run;

 

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2635 views
  • 0 likes
  • 5 in conversation