BookmarkSubscribeRSS Feed
waliaa
Fluorite | Level 6

I've a DS that can have multiple values in purpose column : it can be any combination of 'build', 'buy', 'construction', 'purchase' and so on....

I only need to keep the record if the purpose column has 'buy' or 'build' values or combination of both, but not the others or even a combination of buy or build with any other values. For instance"

Acceptable values are: buy, build, buy & build, build & buy

Unacceptable values: construction, purchase, buy & construction, buy & purchase, build & construction, build & purchase 

How do I go about coding that in sas please ? 

It's easy to write where condition when there is only one value in column but for multiple values, I'm needing some help thanks! 

4 REPLIES 4
Oligolas
Barite | Level 11

Hi,

I'd use the divide and conquer tactic for this:

data have;
infile datalines dsd;
input order ~ $200.;
datalines;
"' build ', 'buy ', 'construction', 'purchase' "
build
;
run;

data want1;
   set have;
   length hasBuild hasBuy hasConstruction hasPurchase 8;
   
   array _orders hasBuild hasBuy hasConstruction hasPurchase;
   do over _orders;
      if findw(order,substr(vname(_orders),4),'','IP') then _orders=1;
   end;
run;
proc print;run;

data want2;
   set want1;
   where hasBuy;
*drop has:; run;
________________________

- Cheers -

mkeintz
PROC Star

You want to keep instances where, in the variable PURPOSE, at least one of a set of words is present, but no words outside the set is present.  Terms can be separated by ampersands and blanks, per your example.

 

Since you have only two desired terms, you can do a simple list of purpose values.  This is not nicely scalable to more than two or three terms:

 

data want;
  set have ;
  if upcase(compress(purpose,' &')) in ('BUILD','BUY','BUILDBUY','BUYBUILD') then found=1;
  else found=0;
run;

If you have comma separators in PURPOSE, then add it to the second argument of the compress function.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

Personally I am of the one variable one value school. EVERY time I have had data with multiple values in a single "variable" any attempt at using that one variable is so much work and starts from scratch for each use that it is just not worth maintaining. One data set I work with has 2 variables that can have as many as 50 codes and are not even nicely separated by any character, just a series of digits that have to be parsed as pairs.

 

If the list of possible values is known and doesn't change frequently then I would suggest creating a series of variables that are 1, for that value is present, and 0 otherwise. A minor variation of @Oligolas code would assign the 0 values in the known list that were not present.

Then your "where" or other conditions become relatively easy.

Consider if you have variables created named Build, Buy, Construction, Purchase (demolish, renovate, whatever) so coded.

If you want to know if any of the given the variables have a value of 1 then

where max(Build, Buy) =1;

If want to find records where all of a list of variables are 1:

Where sum(Build, Buy,Renovate)=3; (3 variables values of 1 or 0 then you only get a sum of 3 when all are 1).

If you want to find records where none of the values are 1:

Where max(build,buy,construction)=0;

If you want to find records where all of the values are the same, 1 or 0 but all the same:

Where range(build,buy,purchase)=0;

 

Or add temporary analysis variables using the functions with selected variables.

Also note that mean of a list of such variables gives the percent of 1 in decimal form.

SAS internally uses 1 for 'true' and 0 for false so you can even reduce very simple conditions:

Where buy;

is the same as "Where buy=1"; when the variable Buy only takes values of 1,0 or possibly missing.

 

Of course you could combine conditions.

 

 

I would consider the coding if "buy" is really different than "purchase" then two variables but face value of the words I would doubt it.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1139 views
  • 2 likes
  • 5 in conversation