BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Curt
Fluorite | Level 6

Hi everyone,

 

I would like to shorten this code, but couldn't find a working solution yet.

 

data Dataset_new;
set Dataset_1 Dataset_2 Dataset_3;
if whichc('Y1', of A1-A100, C, F) or
...
whichc('Y100', of A1-A100, C, F) or
whichc('X', of A1-A100, C, F);
run;

 

I want to extract observations, which have ONE OF the listed VALUES (Y1-Y100,  X) in ONE OF the listed VARIABLES (A1-A100, C, F).

I wonder, if it is possible to sum up all the values in one statement (for example in one "WHICHC"-statement) as they all refer to the same set of variables.

Listing multiple values within one "WHICHC"-statement didn't work, no matter, how I tried to separate or combine them (for example by comma, space, "OR"-Operator etc.)

I also tried the "IF FIND(CAT())" combination instead of "IF WHICHC()" but listing multiple values in one statement didn't work there either. 

Inserting an IN()-function and list the values therein, was also not successful.

 

Often I didn't receive any error messages, but the extracted observations were totally wrong.

 

Many thanks in advance,

Curt

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below a coding approach that will select rows where at least one variable contains a string Y1 ... Y100.

data have;
  infile datalines truncover;
  input (a1-a5 c f) ($);
datalines;
y1 . . . . x z
b b b b b
b a c d y3
d d d y2 y5
d d d d d y100 
d d d d d y101
;

data want;
  set have;
  array vars {*} a1-a5 c f;
  if prxmatch('/\b(y\d{1,2}|y100)\b/oi',strip(catx(' ',of vars[*])));
run;

proc print data=want;
run;

 

 

View solution in original post

4 REPLIES 4
ballardw
Super User

One always suspects data structure but here is one approach that works. Not 'multiple values' but placing them into a structure that shortens the code. I only provide 5 variables as that is all that is needed to show the result.

 

data have;
   input (a1-a5) ( $);
datalines;
y1 . . . .
b b b b b
b a c d y3
d d d y2 y5
;

data want;
   set have;
   array a (*) a1-a5;
   array t (5) $ 2 _temporary_ ('y1','y2','y3','y4','y5');
   do i= 1 to dim(t);
      if whichc(t[i],of a(*))>0 then do;
         output;
         leave;
      end;
   end;
   drop i;
run;

The T array would would require a definition for the length of the values large enough to hold the longest value. The _temporary_ definition means the variables are not written to the data set and stay the same as the initial set values unless you change them. If your actual values for y are sequential as shown you could load the array or replace it with a loop building the text of the search items. But real world data is seldom that nice so the temporary array should work though it will be long statement.

The "trick" is instead of using subsetting IF is the explicit Output when a condition is true coupled with the LEAVE instruction, again conditional, to exit the Do loop early.

The A array would have all the variables you want to search in so could include your C and D or other variables. I'm just too lazy to make complicated data to demonstrate.

 

I do include one example line with two values that are in the search for list to demonstrate only one gets written to the output set.

 

As a couple of asides: If you know that some of your values are more likely to occur than others then placing them first in the temporary array will improve efficiency of the code.

Also, if you wanted to indicate which value was considered as the match then you could a line like:

Found = t[i];

just before the Output statement; Define the length of Found to hold any expected value.

FreelanceReinh
Jade | Level 19

A further (but only minor) simplification of ballardw's elegant solution would be to replace the IF condition

whichc(t[i],of a(*))>0

by

t[i] in a
Patrick
Opal | Level 21

Below a coding approach that will select rows where at least one variable contains a string Y1 ... Y100.

data have;
  infile datalines truncover;
  input (a1-a5 c f) ($);
datalines;
y1 . . . . x z
b b b b b
b a c d y3
d d d y2 y5
d d d d d y100 
d d d d d y101
;

data want;
  set have;
  array vars {*} a1-a5 c f;
  if prxmatch('/\b(y\d{1,2}|y100)\b/oi',strip(catx(' ',of vars[*])));
run;

proc print data=want;
run;

 

 

Curt
Fluorite | Level 6

That was very useful;

thanks to everyone, especially to you ballardw for your detailed explanations!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 487 views
  • 9 likes
  • 4 in conversation