BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nonlinear999
Fluorite | Level 6

Hi Guys 

I have got a dataset that looks like the following - 

data have;
   input ric $ price;
datalines;
"BCDCA" 10 
"BCDCZ" 12 
"ABX"   4	
"ABCBA" 50	
"ABCBZ" 60	
run;

I want to iterate over the ric (which is a character variable) and flag it if the substring after 3 characters = "CA - CZ", "BA-BZ". Here, CA - CZ includes everything in between (i.e., CA, CB, CC, CD etc. ) and same with BA-BZ. For example, if the substring of the ric is 

like the

if substrn(ric,4,2)="CA-CZ" then flag=1; else flag=0 

I know I can't put the condition like this but want to know if there is any way of doing sth like this. Rather than typing it individually (my dataset is huge and has a lot of ric codes) if there is a sas way of doing that, that would save me a huge amount time. Thank you very much. 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@nonlinear999 wrote:
My task does indeed go beyond the characters that I mentioned. But, it seems all the solutions are doing this (including this one) in a way which requires to put in the datalines or do it in informat step. Is there any way of incorporating this solution to an existing dataset (with millions of obs for example) and does not require creating the dataset? Thank you very much.

Existing dataset?  You could avoid creating a dataset file, and just create a dataset view.  The advantage is that you can use the code as in my suggestion, but avoid taking up disk space with a new file.. Moreover, the dataset view can be made permanent, so it's ready for reuse without being re-programmed:

 

data need / view=need;
  set have;
  flag=0;
  if ('A'<=char(ric,4)<='Z') and ('A'<=char(ric,5)<='Z')
    then flag=rank(substr(ric,4,1))-rank('A')+1;
run;
proc freq data=need;
  tables flag;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
data_null__
Jade | Level 19

How about an INFORMAT

 

proc format;
   invalue test 'CA'-'CZ' = 1 'BA'-'BZ' = 2;
   quit;

data have;
   input ric :$quote. price;
   x = substr(ric,4);
   flag = input(x,test.);
   datalines;
"BCDCA" 10 
"BCDCZ" 12 
"ABX"   4	
"ABCBA" 50	
"ABCBZ" 60	
run;

Capture.PNG

Ksharp
Super User
data have;
input ric :$quote20. price;
if "CA"<=substrn(ric,4,2)<="CZ" then flag=1; else flag=0;
datalines;
"BCDCA" 10
"BCDCZ" 12
"ABX" 4
"ABCBA" 50
"ABCBZ" 60
;
mkeintz
PROC Star

If your task goes beyond looking for trailing BA...BZ  or CA...CZ, but rather   !A through !Z (where ! is any capital letter), then you can check for capital A through Z in the 4th and 5th characters then use the "rank" of the 4th character to produce a flag=1 (for AA-AZ), or 2 (BA-BZ), .... or 26 (for ZA-ZZ).  I've added a few more RIC codes as examples: 

 

data have;
  input ric :$quote20. price;
  if ('A'<=char(ric,4)<='Z') and ('A'<=char(ric,5)<='Z')
     then flag=rank(substr(ric,4,1))-rank('A')+1;
  if _n_=1 then put " RIC  PRICE  FLAG";
  put ric @6 price 5. +4 flag 2.;
datalines;
"BCDCA" 10
"BCDCZ" 12
"ABX" 4
"ABCBA" 50
"ABCBZ" 60
"ABCxZ" 101
"DEFA4" 102
"EFGZC" 104
"EFGQa" 105
run;

which produces the following on the sas log:  

 

651  data have;
652    input ric :$quote20. price;
653    if ('A'<=char(ric,4)<='Z') and ('A'<=char(ric,5)<='Z')
654       then flag=rank(substr(ric,4,1))-rank('A')+1;
655    if _n_=1 then put " RIC  PRICE  FLAG";
656    put ric @6 price 5. +4 flag 2.;
657  datalines;

 RIC  PRICE  FLAG
BCDCA   10     3
BCDCZ   12     3
ABX      4     .
ABCBA   50     2
ABCBZ   60     2
ABCxZ  101     .
DEFA4  102     .
EFGZC  104    26
EFGQa  105     .

If you want flag=0 for non-qualifiers (instead of flag=. above), just prefix the if statement with 

  flag=0;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
nonlinear999
Fluorite | Level 6
My task does indeed go beyond the characters that I mentioned. But, it seems all the solutions are doing this (including this one) in a way which requires to put in the datalines or do it in informat step. Is there any way of incorporating this solution to an existing dataset (with millions of obs for example) and does not require creating the dataset? Thank you very much.
mkeintz
PROC Star

@nonlinear999 wrote:
My task does indeed go beyond the characters that I mentioned. But, it seems all the solutions are doing this (including this one) in a way which requires to put in the datalines or do it in informat step. Is there any way of incorporating this solution to an existing dataset (with millions of obs for example) and does not require creating the dataset? Thank you very much.

Existing dataset?  You could avoid creating a dataset file, and just create a dataset view.  The advantage is that you can use the code as in my suggestion, but avoid taking up disk space with a new file.. Moreover, the dataset view can be made permanent, so it's ready for reuse without being re-programmed:

 

data need / view=need;
  set have;
  flag=0;
  if ('A'<=char(ric,4)<='Z') and ('A'<=char(ric,5)<='Z')
    then flag=rank(substr(ric,4,1))-rank('A')+1;
run;
proc freq data=need;
  tables flag;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
nonlinear999
Fluorite | Level 6
Sorry, my understanding of the suggested solutions was poor so thought I needed to create the datalines. All of the solutions actually works. Thanks heaps.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 604 views
  • 7 likes
  • 4 in conversation