- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi SAS Users,
I am trying to delete some observations to filter my data.
My code is as below:
OPTIONS MPRINT;
data arg_merge2;
set 'C:\Users\pnguyen\Desktop\arg_merge2';
run;
data screen1;
set arg_merge2;
if TYPE_1 ne 'EQ' then delete;
if INDC3 = 'UTILS' or INDC3='BANKS' or INDC3='FINSV'
or INDC3='RLEST' or INDC3='INSUR'
then delete;
run;
I have three questions relating to the code above:
1. If I want to delete observations satisfying multiple conditions, can I list two or more "if" like that? I mean, I want to delete observation with TYPE_1 <> 'EQ' and after that, when I got the result, I want to delete observations has INDC3 = 'UTILS' or 'BANKS',....
2. Regarding the code
if indc3 = 'UTILS' OR INDC3='BANKS' or INDC3='FINSV'
or INDC3='RLEST' or INDC3='INSUR'
is there any chance to shorten the code?
3. I know there is a way to delete the observation in current dataset without creating a new one by using PROC MEANS , can you please tell me how to use it in my case specifically?
Thank you in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1 and 2. The answer to both is Yes. However, I think it would be more intuituve to specify when out want to output an observation rather than when you want to delete it. Also, you can use the Where Statement instead of the If Statement. The Where Statement specifies what observations are read, so you don't even read them into the data step.
data screen1;
set arg_merge2;
if TYPE_1 = "EQ" and INDC3 not in ('UTILS', 'BANKS', 'FINSV', 'RLEST', 'INSUR');
run;
3. I think what you want to do is to use Proc Means directly on your original data set and filter the observations directly here. You can use the Where= Data Set Option to do so. See if you can use this as a template.
proc means data = screen1(where=(TYPE_1 = "EQ" and INDC3 not in ('UTILS', 'BANKS', 'FINSV', 'RLEST', 'INSUR')));
/* More code here */
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1 and 2. The answer to both is Yes. However, I think it would be more intuituve to specify when out want to output an observation rather than when you want to delete it. Also, you can use the Where Statement instead of the If Statement. The Where Statement specifies what observations are read, so you don't even read them into the data step.
data screen1;
set arg_merge2;
if TYPE_1 = "EQ" and INDC3 not in ('UTILS', 'BANKS', 'FINSV', 'RLEST', 'INSUR');
run;
3. I think what you want to do is to use Proc Means directly on your original data set and filter the observations directly here. You can use the Where= Data Set Option to do so. See if you can use this as a template.
proc means data = screen1(where=(TYPE_1 = "EQ" and INDC3 not in ('UTILS', 'BANKS', 'FINSV', 'RLEST', 'INSUR')));
/* More code here */
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your insightful answer, regarding the question 3, the code should be
proc means data = arg_merge2(where=(TYPE_1 = "EQ" and INDC3 not in ('UTILS', 'BANKS', 'FINSV', 'RLEST', 'INSUR')));
run;
I think the data should be arg_merge2 rather than screen1, I just want to cross-check to be sure quite a bit
Can I just write such code and do not write any further "More code" as you mentioned? And the two codes you posted equal to each other ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you're right, it should be screen1 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And no, the two code snippets are not equal at all. One creates data (the data step) and one calculates desciptive statistics (PROC MEANS).
For Proc Means code example, check the Documentation.
Feel free to ask 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So, the results from proc means and the data step should be the same, am I correct ?. I mean afterward, if I access the datasets screen1 from data step or arg_merge2 from proc means, they should be the same, am I correct? Or Proc Means just have the action inside this procedure and does not change the original dataset ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One thing, by TYPE_1 <> 'EQ' do you mean Type_1 is not equal to 'EQ'?
Except in a Where expression you will find that <> does not do what you think in a data step.
The <> operator is only valid for numeric values in a data step otherwise:
260 data example; 261 y='EQUAL'; 262 if y <> 'EQ' then put Y= 'is <> to EQ'; NOTE: The "<>" operator is interpreted as "MAX". 263 run; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 262:9 NOTE: Invalid numeric data, 'EQUAL' , at line 262 column 9. y=EQUAL _ERROR_=1 _N_=1 NOTE: The data set WORK.EXAMPLE has 1 observations and 1 variables.
So for character value comparisons in a data step for "not equal" us NE , ~= or ^=
I would also be very careful with not equal conditions in your use case in general. Missing values may need additional consideration for example if really mean "not missing and not equal to 'EQ' ".
Less than conditions may also need explicit treatment as missing is always less than any explicit value.