I am using a very large database and want to keep some of the variables within the database. Further, I want to only keep certain values of chosen variables within the database. I've been trying to format this properly and have not been successful. The program below runs just fine except when I try to limit which values in sym_suffix that I keep (this is the data line with the asterisk). sym_suffix is a character variable that could be up to ten characters. Any help you can provide will be greatly appreciated. Thanks.
data vtemp /view=vtemp;
set taqmsec.ctm_20150707 (keep= sym_root sym_suffix Time_m TR_scond );
where sym_root in('ABT','HON','AA','AEP');
where also time_m between '09:30:00.000't and '10:00:00.000't;
TIME2=time_m-mod(time_m,60);
run;
data no_ctm_20150707;
set vtemp;
by sym_root sym_suffix time2;
if last.time2;
* if sym_suffix = 'A' or '.' then keep;
run;
Try getting rid of "then keep":
if sym_suffix in ('A', ' ');
That might do it. But even if it doesn't, you'll still be closer to the final result you need.
Thanks for your reply, especially since I hadn't tried the where statement. I added it as indicated below. Note: what I want is for the sym_suffix variable to be either an A or a blank space, and I want these to be sorted prior to the time adjustment. When I ran it, I only generated sym_suffix variables with an A. When I changed the order to where also sym_suffix = ' ' or 'A', I ended up with all of the sym_suffix variables listed.
data vtemp /view=vtemp;
set taqmsec.ctm_20150707 (keep= sym_root sym_suffix Time_m TR_scond );
where sym_root in('ABT','HON','AA','AEP');
where also sym_suffix= 'A' or ' ' ;
where also time_m between '09:30:00.000't and '10:00:00.000't;
TIME2=time_m-mod(time_m,60);
run;
data no_ctm_20150707;
set vtemp;
by sym_root sym_suffix time2;
if last.time2;
* if sym_suffix = 'A' or '.' then keep;
run;
Technically, you could just shorten your statement:
if sym_suffix in ('A', '.');
Even though you described your objective, it's tough to tell if that will do what you want. These are the questions I would start with.
First, does your variable SYM_SUFFIX really take on a value of "."? Or are you just looking to delete blank values?
Second, do you understand that you are subsetting observations only, and not subsetting variables (except for KEEP= on the SET statement)?
There could easily be follow-up questions, but that's the starting point.
Thank you for responding. Of my 11 million rows of data, I want the lines of data or observations where sym_suffix equals the letter A or a blank space. If sym_suffix does not equat A or a blank, then I want that observation row and all of the variables in that row deleted. And I want this deletion completed prior to the time data step. Does that make sense?
Mostly it makes sense. This statement would apply:
if sym_suffix in ('A', ' ') then delete;
But the timing isn't clear. What is the "time" data step? Should these observations be deleted before computing last.time2 or after?
Try getting rid of "then keep":
if sym_suffix in ('A', ' ');
That might do it. But even if it doesn't, you'll still be closer to the final result you need.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.