BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kbl
Obsidian | Level 7 kbl
Obsidian | Level 7

I am using a very large database and want to keep some of the variables within the database.  Further, I want to only keep certain values of chosen variables within the database.  I've been trying to format this properly and have not been successful. The program below runs just fine except when I try to limit which values in sym_suffix that I keep (this is the data line with the asterisk).  sym_suffix is a character variable that could be up to ten characters. Any help you can provide will be greatly appreciated.  Thanks.

 

 

data vtemp /view=vtemp;
  set taqmsec.ctm_20150707 (keep= sym_root  sym_suffix  Time_m TR_scond  );
 
 where sym_root in('ABT','HON','AA','AEP');


 where also time_m between '09:30:00.000't and '10:00:00.000't;
  TIME2=time_m-mod(time_m,60);
  
run;
 
data no_ctm_20150707;
  set vtemp;
 
 by sym_root  sym_suffix  time2;
  if last.time2;
*  if sym_suffix = 'A' or '.'  then keep;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Try getting rid of "then keep":

 

if sym_suffix in ('A', ' ');

 

That might do it.  But even if it doesn't, you'll still be closer to the final result you need.

View solution in original post

9 REPLIES 9
LinusH
Tourmaline | Level 20
You can't use KEEP/DROP conditionally.
What is the condition? Do you want to keep the column/variable if any observation/row have the values A or missing? If yes, there have been several threads recently discussing this topic - do a serch.

Or are you confusing KEEP with filtering observations? If this is the case use your condition in a WHERE statement instead.
Data never sleeps
kbl
Obsidian | Level 7 kbl
Obsidian | Level 7

Thanks for your reply, especially since I hadn't tried the where statement.  I added it as indicated below.  Note:  what I want is for the sym_suffix variable to be either an A or a blank space, and I want these to be sorted prior to the time adjustment.  When I ran it, I only generated sym_suffix variables with an A.  When I changed the order to where also sym_suffix = '  ' or 'A', I ended up with all of the sym_suffix variables listed.

 

data vtemp /view=vtemp;
  set taqmsec.ctm_20150707 (keep= sym_root  sym_suffix  Time_m TR_scond  );
 
 where sym_root in('ABT','HON','AA','AEP');

where also sym_suffix=   'A' or ' '  ;


 where also time_m between '09:30:00.000't and '10:00:00.000't;
  TIME2=time_m-mod(time_m,60);
  
run;
 
data no_ctm_20150707;
  set vtemp;
 
 by sym_root  sym_suffix  time2;
  if last.time2;
*  if sym_suffix = 'A' or '.'  then keep;
run;

Astounding
PROC Star

Technically,  you could just shorten your statement:

 

if sym_suffix in ('A', '.');

 

Even though you described your objective, it's tough to tell if that will do what you want.  These are the questions I would start with.

 

First, does your variable SYM_SUFFIX really take on a value of "."?  Or are you just looking to delete blank values?

 

Second, do you understand that you are subsetting observations only, and not subsetting variables (except for KEEP= on the SET statement)?

 

There could easily be follow-up questions, but that's the starting point.

kbl
Obsidian | Level 7 kbl
Obsidian | Level 7

Thank you for responding.  Of my 11 million rows of data, I want the lines of data or observations where sym_suffix equals the letter A or a blank space.  If sym_suffix does not equat A or a blank, then I want that observation row and all of the variables in that row deleted.  And I want this deletion completed prior to the time data step.  Does that make sense?

Astounding
PROC Star

Mostly it makes sense.  This statement would apply:

 

if sym_suffix in ('A', ' ') then delete;

 

But the timing isn't clear.  What is the "time" data step?  Should these observations be deleted before computing last.time2 or after?

kbl
Obsidian | Level 7 kbl
Obsidian | Level 7
I want to keep the observations that have sym_suffix of A or blank. I tried substituting keep for "delete" in your statement, but I got a message it's used out of order. Here's where I placed the statement (I also tried placing the keep statement right after the set statement, and that didn't work either). Thanks for your help.



data vtemp /view=vtemp;
set taqmsec.ctm_20150707 (keep= sym_root sym_suffix Time_m TR_scond );

where sym_root in('ABT','HON','AA','AEP');

if sym_suffix in ('A', ' ') then keep;

where also time_m between '09:30:00.000't and '10:00:00.000't;
TIME2=time_m-mod(time_m,60);

run;

data no_ctm_20150707;
set vtemp;

by sym_root sym_suffix time2;
if last.time2;


run;




Astounding
PROC Star

Try getting rid of "then keep":

 

if sym_suffix in ('A', ' ');

 

That might do it.  But even if it doesn't, you'll still be closer to the final result you need.

kbl
Obsidian | Level 7 kbl
Obsidian | Level 7
Thanks so much for all of your help; the program is doing precisely what I want now.
LinusH
Tourmaline | Level 20
As far as I can tell you want to subset your data, and for that WHERE is the most efficient method.
Data never sleeps

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 4296 views
  • 3 likes
  • 3 in conversation