Hello,
My dataset looks like this:
data have;
length id $10 dcode $48;
input id$ dcode$ &;
datalines;
1 MCB10 PCF01 AAA30
2 AC003 PL000 TAC25
3 QC000 CAB50 FCE10
4 MA100 CA500 DE100
;
run;
I would like to flag (1 or 0) every row if they have a code that starts with two letters.
data want;
length id $10 dcode $48;
input id$ dcode$ & $let_flag;
datalines;
1 MCB10 PCF01 AAA30 0
2 AC003 PL000 TAC25 1
3 QC000 CAB50 FCE10 1
4 MA100 CA500 DE100 1
;
run;
Thanks!
If I'm understanding you, you want flag=1 if any of the codes on the record start with exactly 2 letters (followed by numeric digits). This regular expression checks for that match (and whether it occurs at the start of a record or after a space).
data have;
length id $10 dcode $48;
input id$ dcode$ &;
datalines;
1 MCB10 PCF01 AAA30
2 AC003 PL000 TAC25
3 QC000 CAB50 FCE10
4 MA100 CA500 DE100
;
run;
data want;
set have;
flag = (prxmatch('/(^|\s)[A-Z][A-Z][0-9]/',dcode)>0);
run;
So, you have multiple codes in your Dcode Variable, right?
Do all of them have to start with exactly 2 letters or at least 2 letters?
Why is let_flag = 0 in the first obs?
"I would like to flag (1 or 0) every row if they have a code that starts with two letters."
Define "a code". As I look at your data you apparently have 3 values stuck into a single variable. In a very large number of cases this is very poor data structuring so I can't tell if you want the "starts with two letters" to mean the long value with multiple spaces as a single code or each of the pieces separated by spaces to be a code.
Exactly 2? Of ALL the groups or just any one?
Yes, the codes are stored in one variable. The codes always contain five digits, they can start with two letters or three letters. I would like to select the codes that starts with two letters.
If we for example think that each code is stored in separate variables, example:
dcode1 dcode2 dcode3 dcode4 dcode5 etc. Then I could use an array.
Comment to post. It should be start with exactly two letters.
If I'm understanding you, you want flag=1 if any of the codes on the record start with exactly 2 letters (followed by numeric digits). This regular expression checks for that match (and whether it occurs at the start of a record or after a space).
data have;
length id $10 dcode $48;
input id$ dcode$ &;
datalines;
1 MCB10 PCF01 AAA30
2 AC003 PL000 TAC25
3 QC000 CAB50 FCE10
4 MA100 CA500 DE100
;
run;
data want;
set have;
flag = (prxmatch('/(^|\s)[A-Z][A-Z][0-9]/',dcode)>0);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.