Hi Guys
I have got a dataset that looks like the following -
data have; input ric $ price; datalines; "BCDCA" 10 "BCDCZ" 12 "ABX" 4 "ABCBA" 50 "ABCBZ" 60 run;
I want to iterate over the ric (which is a character variable) and flag it if the substring after 3 characters = "CA - CZ", "BA-BZ". Here, CA - CZ includes everything in between (i.e., CA, CB, CC, CD etc. ) and same with BA-BZ. For example, if the substring of the ric is
like the
if substrn(ric,4,2)="CA-CZ" then flag=1; else flag=0
I know I can't put the condition like this but want to know if there is any way of doing sth like this. Rather than typing it individually (my dataset is huge and has a lot of ric codes) if there is a sas way of doing that, that would save me a huge amount time. Thank you very much.
@nonlinear999 wrote:
My task does indeed go beyond the characters that I mentioned. But, it seems all the solutions are doing this (including this one) in a way which requires to put in the datalines or do it in informat step. Is there any way of incorporating this solution to an existing dataset (with millions of obs for example) and does not require creating the dataset? Thank you very much.
Existing dataset? You could avoid creating a dataset file, and just create a dataset view. The advantage is that you can use the code as in my suggestion, but avoid taking up disk space with a new file.. Moreover, the dataset view can be made permanent, so it's ready for reuse without being re-programmed:
data need / view=need;
set have;
flag=0;
if ('A'<=char(ric,4)<='Z') and ('A'<=char(ric,5)<='Z')
then flag=rank(substr(ric,4,1))-rank('A')+1;
run;
proc freq data=need;
tables flag;
run;
How about an INFORMAT
proc format;
invalue test 'CA'-'CZ' = 1 'BA'-'BZ' = 2;
quit;
data have;
input ric :$quote. price;
x = substr(ric,4);
flag = input(x,test.);
datalines;
"BCDCA" 10
"BCDCZ" 12
"ABX" 4
"ABCBA" 50
"ABCBZ" 60
run;
If your task goes beyond looking for trailing BA...BZ or CA...CZ, but rather !A through !Z (where ! is any capital letter), then you can check for capital A through Z in the 4th and 5th characters then use the "rank" of the 4th character to produce a flag=1 (for AA-AZ), or 2 (BA-BZ), .... or 26 (for ZA-ZZ). I've added a few more RIC codes as examples:
data have;
input ric :$quote20. price;
if ('A'<=char(ric,4)<='Z') and ('A'<=char(ric,5)<='Z')
then flag=rank(substr(ric,4,1))-rank('A')+1;
if _n_=1 then put " RIC PRICE FLAG";
put ric @6 price 5. +4 flag 2.;
datalines;
"BCDCA" 10
"BCDCZ" 12
"ABX" 4
"ABCBA" 50
"ABCBZ" 60
"ABCxZ" 101
"DEFA4" 102
"EFGZC" 104
"EFGQa" 105
run;
which produces the following on the sas log:
651 data have;
652 input ric :$quote20. price;
653 if ('A'<=char(ric,4)<='Z') and ('A'<=char(ric,5)<='Z')
654 then flag=rank(substr(ric,4,1))-rank('A')+1;
655 if _n_=1 then put " RIC PRICE FLAG";
656 put ric @6 price 5. +4 flag 2.;
657 datalines;
RIC PRICE FLAG
BCDCA 10 3
BCDCZ 12 3
ABX 4 .
ABCBA 50 2
ABCBZ 60 2
ABCxZ 101 .
DEFA4 102 .
EFGZC 104 26
EFGQa 105 .
If you want flag=0 for non-qualifiers (instead of flag=. above), just prefix the if statement with
flag=0;
@nonlinear999 wrote:
My task does indeed go beyond the characters that I mentioned. But, it seems all the solutions are doing this (including this one) in a way which requires to put in the datalines or do it in informat step. Is there any way of incorporating this solution to an existing dataset (with millions of obs for example) and does not require creating the dataset? Thank you very much.
Existing dataset? You could avoid creating a dataset file, and just create a dataset view. The advantage is that you can use the code as in my suggestion, but avoid taking up disk space with a new file.. Moreover, the dataset view can be made permanent, so it's ready for reuse without being re-programmed:
data need / view=need;
set have;
flag=0;
if ('A'<=char(ric,4)<='Z') and ('A'<=char(ric,5)<='Z')
then flag=rank(substr(ric,4,1))-rank('A')+1;
run;
proc freq data=need;
tables flag;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.