BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mgrasmussen
Quartz | Level 8

Dear SAS experts

 

I have approximately 30 variables in a dataset which are all character variables. Many of them contain no data (are blank), but the rest contain some text. I would like to delete all observations which include certain data (text) among these 30 variables.

 

For example in a dataset which include data on automobile (brand) ownership, I might want to all those persons/observations who own a "Toyota" or a "Honda". This fictional dataset may include 20 variables containing information on automobile ownership, i.e. a person may in this case own up to 20 cars. But I would like to delete those observations who own either a "Toyota" or a "Honda" (and of course then also those who own both), regardsless of the brands of cars they may own in addition to these brands.

 

I have tried different syntax but I cannot get it to work. Can anyone help with some generic code? I suspect that I need to run some syntax in combination with some form of "first variable-last variable in dataset" specification.

 

Thank you

 

Best regards

1 ACCEPTED SOLUTION

Accepted Solutions
tarheel13
Rhodochrosite | Level 12

ok, I don't know what your variables are named but pretend it's var1-var30. Then you can write code like this. 

data mydata;
set cars;
flag='N';
array cars[30] $ var1-var30;
do i=1 to dim(cars);
if cars[i] in ('Toyota', 'Honda') then flag='Y';
end;
if flag='Y' then delete;
run;


View solution in original post

19 REPLIES 19
tarheel13
Rhodochrosite | Level 12

I think you should make an array and loop through it. Create a flag for the if it's Toyota or Honda. Then simply write if flag='Y' then delete in a DATA step.

mgrasmussen
Quartz | Level 8
Dear Irackley

Thank you for the advice.

I am somewhat new to SAS and have only briefly heart about arrays. I will look more into this.
tarheel13
Rhodochrosite | Level 12

ok, I don't know what your variables are named but pretend it's var1-var30. Then you can write code like this. 

data mydata;
set cars;
flag='N';
array cars[30] $ var1-var30;
do i=1 to dim(cars);
if cars[i] in ('Toyota', 'Honda') then flag='Y';
end;
if flag='Y' then delete;
run;


mgrasmussen
Quartz | Level 8
Looks great. Thanks!

What does "if cars[I]" reference? I suspect it has something to do with the do loop. Perhaps there is no case sensitivity here? i (small i)=I (large i). I suspect that this is the case.

Best regards
tarheel13
Rhodochrosite | Level 12

try it out and let me know if it works. I wanted to leave it lowercase but my laptop is automatically capitalizing it. I declared a character array called cars for your 30 variables or maybe you said 20. it is going to loop through each element in the array. I is the iterator. it's referencing elements in the array. dim(cars) is the all the values in the array or if you don't want to do that just put I=1 to 30 if you know there are 30 elements in the array.

mgrasmussen
Quartz | Level 8
Once again thanks.

The only challenge I have left is that your example assumes that my variables are numbered 1-30 (at the end of each variable name). They are not. Is there not a more flexible way of approaching this problem where one is not reliant on numbers, but where one can define a variable list to loop over?

Thanks
tarheel13
Rhodochrosite | Level 12

No. You will have to write the names of the variables to include in the array. 

mgrasmussen
Quartz | Level 8

Dear Irackley

The solution I found makes most sense it based on most of your code, but with small changes here and there.

My proposed solution can be written in a general way as:

data new;
set old;
flag='N';
array delete $ firstvar--last var; /* There are approximately 30 variables (consecutive), which are all character variables */
do i=1 to dim(delete);
if (delete(i)='nametype1' or delete(i)='nametype2') then flag='Y';
end;
if flag='Y' then delete;
drop flag i;
run;

I realize that there maybe even smarter ways of writing the code but I have written it in this way so it more understable for both myself and my colleagues.

Once again thanks for your help!

ballardw
Super User

@mgrasmussen wrote:
Once again thanks.

The only challenge I have left is that your example assumes that my variables are numbered 1-30 (at the end of each variable name). They are not. Is there not a more flexible way of approaching this problem where one is not reliant on numbers, but where one can define a variable list to loop over?

Thanks

In the bit

array cars[30] $ var1-var30;

you can replace the var1-var30 with any list of your variables. You don't even have to have the number 30 if the variables already exist in the data set, use array (*) and the size of the array is set by the number of variables you supply.

array somename(*) thisvar thatvar anothervar a b c pdq;

for example would create the array Somename and the variables listed would be the members.

Read the documentation for the Array statement. The variables must be of the same type.

Somename[1] references the first variable named: Thisvar; Somename[2] would be Thatvar and so on.

Any place your code might want to use the variable values you can use either the array reference, Somename[] or the name.  The value used in the brackets must be an integer and unless you use some different options for establishing the array have a value fo 1 to n, where n is the number of variables.

 

mgrasmussen
Quartz | Level 8

Dear Ballardw

 

Thanks for the input. But if you have 70 variables the code would be quite long would and not that helpful, correct? As far as I understand your code makes sense mostly when you have relatively few variables which you want to loop over. But it does work around the problem of having to rely on numbers.

 

PaigeMiller
Diamond | Level 26

@mgrasmussen wrote:
The only challenge I have left is that your example assumes that my variables are numbered 1-30 (at the end of each variable name). They are not. Is there not a more flexible way of approaching this problem where one is not reliant on numbers, but where one can define a variable list to loop over?

You can use variable lists, for example, if the variables are consecutive in the SAS data set, and the first variable is named population and the last of the consecutive variables is named avg_income, you can use something like this:

 

array x population--avg_income;

But enough of us guessing ... @mgrasmussen its time for you to be specific, and show us (a portion of) your actual variable names.

--
Paige Miller
mgrasmussen
Quartz | Level 8
Hey PaigeMiller

This sounds like something which could be a solution. My only question is how you then “call” such a list (“x” according to your example) in a SAS do loop?

Thank you
PaigeMiller
Diamond | Level 26

Same way you created and used an ARRAY previously (as in the example code from @tarheel13 )

--
Paige Miller
mgrasmussen
Quartz | Level 8

It works perfect. Thanks.

 

Normally I would supply a code example using some available data, but I could not find one which was readily available. Fortunately I got the help I needed using generic code.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 1766 views
  • 13 likes
  • 6 in conversation