BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aaronh
Quartz | Level 8

Hello experts,

 

Suppose I have a table with the names of a few high schools and the average SAT scores of the students from each of these schools:

 

 

data test;
input school $ avg_sat;
datalines;
a1 1513
a2 1320
a3 1476
a4 1280
a5 1520
b1 1378
b2 1410
;
run;

Now, suppose I want to subset to only include only a1 - a5, I would write something like:

 

 

data want;
set test;
where school in ('a1', 'a2', 'a3', 'a4', 'a5');
run;

This is totally OK in this scenario since I only have 5 elements within the in operator (a1, a2, a3, a4, a5). Nonetheless, this can quickly become impractical if I am doing this for a1 - a100 instead, for example, for which I will have to type 100 elements within the in operator.

 

 

I know that if the variable school is a numeric variable instead, I can use something like:

where school in (1:5);

but since it is a character variable in this case, I cannot use the ':' sign to indicate a range. I know I can still use a substr() function to separate the numbers from the school variable and go from there, but i am just wondering if there's a shortcut that allows something like:

 

where school in ('a1' : 'a5');

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
hashman
Ammonite | Level 13

@aaronh:

If you have a pattern with regard to the values you want to include in your search - for example, if they all start with a specific prefix or form a prefixed numbered list - you already have your answers.

 

However, I can't help but point out that even if you have a pattern, let alone if the value list is arbitrary, using the IN operator in this manner still amounts to hard coding. Such practice is okay for a one-time ad hoc program, but it's surely not a good thing if you (and especially someone else) should revisit the program in the future, if it's part of production, and/or is under change control.

 

A much saner and more scalable way of program organization is to store the search-for values in an editable control file outside the change control encumbrance. It's specific format is of secondary importance, as long as your program can read it properly to input the values into the SAS system (though since it's SAS, storing a search list in a SAS data set rather than, say, in a flat file seems more than reasonable). This way, if you need to augment or update the list of search values, you only have to edit the control file and never touch the program itself.

 

Once you've got the values on the file, there're a myriad of ways to let SAS tell what they are in order to search for them in the WHERE clause, such as:

  • Create an in/format.
  • Load a hash table via FCMP.
  • Auto-compose a list for the IN operator, replete with the quotes around each value if needed.
  • Etc.

For example, in your case, it could look like:

data values ;                                                             
  input value $3. ;                                                       
  cards ;                                                                 
a1                                                                        
a2                                                                        
a3                                                                        
a4                                                                        
a5                                                                        
run ;                                                                     
                                                                          
proc sql noprint ;                                                        
  select quote (strip (value)) into :inlist separated by " " from values ;
quit ;                                                                    
                                                                          
data want ;                                                               
  set test ;                                                              
  where school in (&inlist) ;                                             
run ;                                                                     

You'd store the file VALUES on the side (outside change control) and simply edit it as you wish if you needed to add, delete, or change any search-for values in the future. And if it's a list shared across a team, department, etc., other programmers would be able to simply read it instead of copying a hard coded list into their programs from yours.

 

Kind regards

Paul D.

 

 

       

View solution in original post

14 REPLIES 14
novinosrin
Tourmaline | Level 20

Interesting question. How about a work around with a colon modifier ( : )

 

data test;
input school $ avg_sat;
datalines;
a1 1513
a2 1320
a3 1476
a4 1280
a5 1520
b1 1378
b2 1410
;
run;

data want;
set test;
where school in :('a');
run;

 

novinosrin
Tourmaline | Level 20

or Something fancy

 

%let n=5;
data want;
set test;
array a(&n)$ ;
retain a;
if _n_=1 then do _n_=1 to dim(a);
 a(_n_)=vname(a(_n_));
 end;
if school in a;
drop a:;
run;
aaronh
Quartz | Level 8
Thanks novinosrin! While this works in this case, it wouldn't have worked if I have a1-a100 in the data, and only want to include a1-a80 in the subset, for example.
novinosrin
Tourmaline | Level 20

Can you change N to 80 ?

 

%let n=80;

and execute the previous array method? 

r_behata
Barite | Level 11

How about

 

data want;
	set test;
	where input(compress(school,,'kd'),3.)  le 80;
run;
ballardw
Super User

If you know all of the names involved then create a numeric code and use that if you really need the list type of description.

 

If your "list" is supposed to represent a group, such as members of a school district, then you may be better off with an additional variable.

 

You may also be implying that you want only the A, only the B etc.

That could be

 

where school=: 'a';

 

I have tended to create separate grouping variables and then done things with By group processing on the group and/or the group and school , region, county, school district, school type actually. With a whole lot more things to summarize than a single value like SAT.

hashman
Ammonite | Level 13

@aaronh:

If you have a pattern with regard to the values you want to include in your search - for example, if they all start with a specific prefix or form a prefixed numbered list - you already have your answers.

 

However, I can't help but point out that even if you have a pattern, let alone if the value list is arbitrary, using the IN operator in this manner still amounts to hard coding. Such practice is okay for a one-time ad hoc program, but it's surely not a good thing if you (and especially someone else) should revisit the program in the future, if it's part of production, and/or is under change control.

 

A much saner and more scalable way of program organization is to store the search-for values in an editable control file outside the change control encumbrance. It's specific format is of secondary importance, as long as your program can read it properly to input the values into the SAS system (though since it's SAS, storing a search list in a SAS data set rather than, say, in a flat file seems more than reasonable). This way, if you need to augment or update the list of search values, you only have to edit the control file and never touch the program itself.

 

Once you've got the values on the file, there're a myriad of ways to let SAS tell what they are in order to search for them in the WHERE clause, such as:

  • Create an in/format.
  • Load a hash table via FCMP.
  • Auto-compose a list for the IN operator, replete with the quotes around each value if needed.
  • Etc.

For example, in your case, it could look like:

data values ;                                                             
  input value $3. ;                                                       
  cards ;                                                                 
a1                                                                        
a2                                                                        
a3                                                                        
a4                                                                        
a5                                                                        
run ;                                                                     
                                                                          
proc sql noprint ;                                                        
  select quote (strip (value)) into :inlist separated by " " from values ;
quit ;                                                                    
                                                                          
data want ;                                                               
  set test ;                                                              
  where school in (&inlist) ;                                             
run ;                                                                     

You'd store the file VALUES on the side (outside change control) and simply edit it as you wish if you needed to add, delete, or change any search-for values in the future. And if it's a list shared across a team, department, etc., other programmers would be able to simply read it instead of copying a hard coded list into their programs from yours.

 

Kind regards

Paul D.

 

 

       

novinosrin
Tourmaline | Level 20

Guru @hashman   Not that this is important. I just recalled what you said in Proc SQL: No LOG Note for Missing values? thread "I'd rather say it's a system of tools, each with its own language sharing certain main syntactic features with the others. "

 

Pretty impeccable statement.

 

While it irks me as it confuses me trying to  familiarizing certain functionalities, it's great to note the above being so real as we can see that in  proc iml, the char continuous sequence works 🙂

 

proc iml;
want='a1':'a10';
print want;
quit;
want 
a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 

 

 

hashman
Ammonite | Level 13

@novinosrin:

Thanks for the demo - good to know!

Kind regards

Paul D. 

Astounding
PROC Star

A good plan for next time:  use different data values:  a001, a002, a003, etc.  Then subsetting becomes much easier:

 

where ('a001' <= school <= 'a080');

 

ballardw
Super User

@Astounding wrote:

A good plan for next time:  use different data values:  a001, a002, a003, etc.  Then subsetting becomes much easier:

 

where ('a001' <= school <= 'a080');

 


If the OP school data is anything like the school data I have worked with the "next time" first step is identifying which school identifiers changed and provide a mapping from last year to this year, or similar. But perhaps he doesn't have an Education department that intermixes any or 3 different "identification" schemes.

aaronh
Quartz | Level 8
Thanks a lot!
where('a1' <= school <= 'a5') also worked just fine.
ballardw
Super User

@aaronh wrote:
Thanks a lot!
where('a1' <= school <= 'a5') also worked just fine.

Danger Will Robinson:  If you have school named 'a10' or 'a20' or 'a3456' these will be included in the range.

 

data example;
   input school $;
   if ('a1' le school le 'a5') then put School= 'is in the range specified';
datalines;
a1
a2
a3
a4
a5
a6
a10
a300
;

Read the log.

Character values are compared character by character with the range values. As soon as one character in the same position is out of range it is not inside the range. So since some of these examples are longer than the range values the comparison stops at two characters.

 

Perhaps your actual data does not go to 'a10'. But be very wary of any character comparison involving > or < whether in combination of an equal or not.

aaronh
Quartz | Level 8

I want to thank everyone for your input.

 

Apparently my question was very open-ended, as many of you have provided some solution that'd solve my question. I personally like the 'control file' idea the best, as it seems to be more flexible and generalizable, in addition to being helpful in project management.

 

Thanks again!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1821 views
  • 9 likes
  • 6 in conversation