I've a DS that can have multiple values in purpose column : it can be any combination of 'build', 'buy', 'construction', 'purchase' and so on....
I only need to keep the record if the purpose column has 'buy' or 'build' values or combination of both, but not the others or even a combination of buy or build with any other values. For instance"
Acceptable values are: buy, build, buy & build, build & buy
Unacceptable values: construction, purchase, buy & construction, buy & purchase, build & construction, build & purchase
How do I go about coding that in sas please ?
It's easy to write where condition when there is only one value in column but for multiple values, I'm needing some help thanks!
where not find(purpose,"construction") and not find(purpose,"purchase")
Expand for other exceptions as needed.
Hi,
I'd use the divide and conquer tactic for this:
data have;
infile datalines dsd;
input order ~ $200.;
datalines;
"' build ', 'buy ', 'construction', 'purchase' "
build
;
run;
data want1;
set have;
length hasBuild hasBuy hasConstruction hasPurchase 8;
array _orders hasBuild hasBuy hasConstruction hasPurchase;
do over _orders;
if findw(order,substr(vname(_orders),4),'','IP') then _orders=1;
end;
run;
proc print;run;
data want2;
set want1;
where hasBuy;
*drop has:;
run;
- Cheers -
You want to keep instances where, in the variable PURPOSE, at least one of a set of words is present, but no words outside the set is present. Terms can be separated by ampersands and blanks, per your example.
Since you have only two desired terms, you can do a simple list of purpose values. This is not nicely scalable to more than two or three terms:
data want;
set have ;
if upcase(compress(purpose,' &')) in ('BUILD','BUY','BUILDBUY','BUYBUILD') then found=1;
else found=0;
run;
If you have comma separators in PURPOSE, then add it to the second argument of the compress function.
Personally I am of the one variable one value school. EVERY time I have had data with multiple values in a single "variable" any attempt at using that one variable is so much work and starts from scratch for each use that it is just not worth maintaining. One data set I work with has 2 variables that can have as many as 50 codes and are not even nicely separated by any character, just a series of digits that have to be parsed as pairs.
If the list of possible values is known and doesn't change frequently then I would suggest creating a series of variables that are 1, for that value is present, and 0 otherwise. A minor variation of @Oligolas code would assign the 0 values in the known list that were not present.
Then your "where" or other conditions become relatively easy.
Consider if you have variables created named Build, Buy, Construction, Purchase (demolish, renovate, whatever) so coded.
If you want to know if any of the given the variables have a value of 1 then
where max(Build, Buy) =1;
If want to find records where all of a list of variables are 1:
Where sum(Build, Buy,Renovate)=3; (3 variables values of 1 or 0 then you only get a sum of 3 when all are 1).
If you want to find records where none of the values are 1:
Where max(build,buy,construction)=0;
If you want to find records where all of the values are the same, 1 or 0 but all the same:
Where range(build,buy,purchase)=0;
Or add temporary analysis variables using the functions with selected variables.
Also note that mean of a list of such variables gives the percent of 1 in decimal form.
SAS internally uses 1 for 'true' and 0 for false so you can even reduce very simple conditions:
Where buy;
is the same as "Where buy=1"; when the variable Buy only takes values of 1,0 or possibly missing.
Of course you could combine conditions.
I would consider the coding if "buy" is really different than "purchase" then two variables but face value of the words I would doubt it.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.