Dear SAS experts
I have approximately 30 variables in a dataset which are all character variables. Many of them contain no data (are blank), but the rest contain some text. I would like to delete all observations which include certain data (text) among these 30 variables.
For example in a dataset which include data on automobile (brand) ownership, I might want to all those persons/observations who own a "Toyota" or a "Honda". This fictional dataset may include 20 variables containing information on automobile ownership, i.e. a person may in this case own up to 20 cars. But I would like to delete those observations who own either a "Toyota" or a "Honda" (and of course then also those who own both), regardsless of the brands of cars they may own in addition to these brands.
I have tried different syntax but I cannot get it to work. Can anyone help with some generic code? I suspect that I need to run some syntax in combination with some form of "first variable-last variable in dataset" specification.
Thank you
Best regards
ok, I don't know what your variables are named but pretend it's var1-var30. Then you can write code like this.
data mydata;
set cars;
flag='N';
array cars[30] $ var1-var30;
do i=1 to dim(cars);
if cars[i] in ('Toyota', 'Honda') then flag='Y';
end;
if flag='Y' then delete;
run;
I think you should make an array and loop through it. Create a flag for the if it's Toyota or Honda. Then simply write if flag='Y' then delete in a DATA step.
ok, I don't know what your variables are named but pretend it's var1-var30. Then you can write code like this.
data mydata;
set cars;
flag='N';
array cars[30] $ var1-var30;
do i=1 to dim(cars);
if cars[i] in ('Toyota', 'Honda') then flag='Y';
end;
if flag='Y' then delete;
run;
try it out and let me know if it works. I wanted to leave it lowercase but my laptop is automatically capitalizing it. I declared a character array called cars for your 30 variables or maybe you said 20. it is going to loop through each element in the array. I is the iterator. it's referencing elements in the array. dim(cars) is the all the values in the array or if you don't want to do that just put I=1 to 30 if you know there are 30 elements in the array.
No. You will have to write the names of the variables to include in the array.
Dear Irackley
The solution I found makes most sense it based on most of your code, but with small changes here and there.
My proposed solution can be written in a general way as:
data new;
set old;
flag='N';
array delete $ firstvar--last var; /* There are approximately 30 variables (consecutive), which are all character variables */
do i=1 to dim(delete);
if (delete(i)='nametype1' or delete(i)='nametype2') then flag='Y';
end;
if flag='Y' then delete;
drop flag i;
run;
I realize that there maybe even smarter ways of writing the code but I have written it in this way so it more understable for both myself and my colleagues.
Once again thanks for your help!
@mgrasmussen wrote:
Once again thanks.
The only challenge I have left is that your example assumes that my variables are numbered 1-30 (at the end of each variable name). They are not. Is there not a more flexible way of approaching this problem where one is not reliant on numbers, but where one can define a variable list to loop over?
Thanks
In the bit
array cars[30] $ var1-var30;
you can replace the var1-var30 with any list of your variables. You don't even have to have the number 30 if the variables already exist in the data set, use array (*) and the size of the array is set by the number of variables you supply.
array somename(*) thisvar thatvar anothervar a b c pdq;
for example would create the array Somename and the variables listed would be the members.
Read the documentation for the Array statement. The variables must be of the same type.
Somename[1] references the first variable named: Thisvar; Somename[2] would be Thatvar and so on.
Any place your code might want to use the variable values you can use either the array reference, Somename[] or the name. The value used in the brackets must be an integer and unless you use some different options for establishing the array have a value fo 1 to n, where n is the number of variables.
Dear Ballardw
Thanks for the input. But if you have 70 variables the code would be quite long would and not that helpful, correct? As far as I understand your code makes sense mostly when you have relatively few variables which you want to loop over. But it does work around the problem of having to rely on numbers.
@mgrasmussen wrote:
The only challenge I have left is that your example assumes that my variables are numbered 1-30 (at the end of each variable name). They are not. Is there not a more flexible way of approaching this problem where one is not reliant on numbers, but where one can define a variable list to loop over?
You can use variable lists, for example, if the variables are consecutive in the SAS data set, and the first variable is named population and the last of the consecutive variables is named avg_income, you can use something like this:
array x population--avg_income;
But enough of us guessing ... @mgrasmussen its time for you to be specific, and show us (a portion of) your actual variable names.
Same way you created and used an ARRAY previously (as in the example code from @tarheel13 )
It works perfect. Thanks.
Normally I would supply a code example using some available data, but I could not find one which was readily available. Fortunately I got the help I needed using generic code.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.