Hi everyone,
Does anyone know how i can "macrofy" the following code please? Basically I am adding zeros back on to a variable. that had them stripped out from the variable code when I had to convert it to a numeric for a prior data step.
DATA wanta;
SET want;
check_for_005=find(code,'00000');
check_for_004=find(code,'0000');
check_for_003=find(code,'000');
check_for_002=find(code,'00');
check_for_001=find(code,'0');
run;
data wantaa;
set wanta;
if check_for_001=1 and check_for_002=0 and check_for_003=0 and check_for_004=0 and check_for_005=0 then do codefinal=substr(code,1,1.);end;else;
if check_for_002=1 and check_for_003=0 and check_for_004=0 and check_for_005=0 then do codefinal=substr(code,1,2.);end;else;
if check_for_003=1 and check_for_004=0 and check_for_005=0 then do codefinal=substr(code,1,3.);end;else;
if check_for_004=1 and check_for_005=0 then do codefinal=substr(code,1,3.);end;
run;
data wantaaa;
set wantaa;
cat=trim(left(codefinal))||trim(left(codeall));
run;
Many thanks!!
Hi,
I'm not sure that I understand your requirement. I don't see which of the "code..." variables you're using is numeric.
If you want to convert a character value to a numeric value in a data step then you can use something like:
num_var=input(char_var,8.);
Are the zeros you're adding leading or trailing?
Please provide some example data of what you have and what you want.
Regards,
Amir.
Hi Amir,
Sorry for my delay in getting back to you. Here is my program as it stands. I just wanted to know if I could "marcrofy" some bits of it as it quote long winded.
*****Notes ***Import Excel worksheet contain code ranges eg - Excel Column called CODE
001-010
050-055
1000-1100 and so on.
libname Codes excel "Codelist_LH.xlsx"
header = yes
mixed = yes;
run;
data Code_Range;
set Codes.'List$'n;
where Lvl1Group ne 'Exclusion';
run;
*****Read the Variable containing the individual code or code range that has been imported and list out each code individually in a new Variable called INDIV_CODE;
data Code_Range1 (drop=BeginCode EndCode);
set Code_Range ;
BeginCode = input(scan(Code,1),best10.);
EndCode = input(scan(Code,2),best10.);
if missing(EndCode) then do Indiv_Code=input(Code,best10.);output ;end;
else do Indiv_Code = BeginCode to Endcode;
output ;
end;
run;
*****Identify number of leading Zeros associated with original variable CODE to later assign to numeric variable INDIV_CODE ;
Data Code_Zero;
set Code_Range1;
Check_for_005=find(code,'00000');
Check_for_004=find(code,'0000');
Check_for_003=find(code,'000');
Check_for_002=find(code,'00');
Check_for_001=find(code,'0');
run;
*****Take the correct number of leading zeros per code and create a variable CODEFINAL containing the correct number of zeros ;
data Code_Zero1;
set Code_Zero;
if Check_for_001=1 and Check_for_002=0 and Check_for_003=0 and Check_for_004=0 and Check_for_005=0 then do Codefinal=substr(Code,1,1.);end;else;
if Check_for_002=1 and Check_for_003=0 and Check_for_004=0 and Check_for_005=0 then do Codefinal=substr(Code,1,2.);end;else;
if Check_for_003=1 and Check_for_004=0 and Check_for_005=0 then do Codefinal=substr(Code,1,3.);end;else;
if Check_for_004=1 and Check_for_005=0 then do Codefinal=substr(Code,1,3.);end;
run;
*****Concatenate the variable CODEFINAL (contains zeros) and INDIV_CODE contains original code minus leading zeros;
data Num_Code (drop= check_for_001 check_for_002 check_for_003 check_for_004 check_for_005 Indiv_Code codefinal) ;
set Code_Zero1;
Numcode=trim(left(codefinal))||trim(left(Indiv_Code));
run;
*****Concatenate Character Prefix or Suffix to the variable Numcode ;
data Char_Code;
set Num_Code;
if missing(Prefix_) and missing(Suffix) then do Complete_Code=Numcode;end; else;
if Prefix_ ne " " then do Complete_Code=trim(left(Prefix_))||trim(left(Numcode));end; else
if Suffix ne " " then do Complete_Code=trim(left(Numcode))||trim(left(Suffix));end;
Run;
*****Drop Prefix and Suffix Columns and rename Complete_Code;
Data Final_Code (drop=numcode prefix_ suffix Complete_Code);
set Char_Code;
code= Complete_Code;
Run;
Are you familiar with the Z format?
If you use input with the Z8. format for example you can convert number to have a specific about of leading zeros.
ie
data have;
input number;
cards;
8
28
456
3434343
343
26
6345
665
7572
;
run;
data want;
set have;
padded_number=put(number, z8.);
run;
Unfortunately the variable code is differing in length and number of leading zeros. Would z8. not make code 8 characters long for every instance i have a variety of codes 050 50 0050 etc and the number of leading zeros is very important it has to be exact.
You realize your find statement won't differentiate between
0500 and 0050?
If you know the length of the original variable you can make it more flexible by using the Z||Length in combination with a putc or putn function.
Also, why do this, in the previous step, why not create a new variable to do the analysis on and keep the old variable?
data Code_Range1 ;
set Code_Range ;
* BeginCode = input(scan(Code,1),best10.);
Begin_code = scan( code,1) ;
end_code = scan( code, -1) ;
len_indiv =max(length( begin_code), length(end_code) ) ;
do Indiv_C= Begin_Code to End_code;
Length indiv_code $15 ;
Indiv_code = putn( indiv_c, 'Z', len_indiv ) ;
output ;
end;
*Drop begin_code end_code len_indiv indiv_c ;
run;
Why make life difficult needlessly? Leave begin_code and end_code as strings.
Work out what output length is appropriate.
The code above assumes the wider of the begin/end pair.
Using SCAN(code, -1) for END provides the first when there is only one code. If course you might have more than two numbers. In that case the "-1" scans for the last of the three.
The PUTN() function allows the result length (format width) to be defined by the 3rd parameter of PUTN(). There you use the length of your input strings. Ensure you allow wide enough room. The original code used 10 when making those first conversions with an INPUT() function but you could make it any useful value up to 15.
I know this is not polishing your macro learning but I think it might be easier to maintain and hope it helps.
peterC
p.s.
I know the automatic conversion of strings begin_code and end_code will generate NOTEs in the saslog, but leave out the input() syntax for clarity.
Thanks so much Peter for your help I have updated my program with your improved suggestions. I have a quick question about the code below. Is it possible to adapt it to look for ranges that have a character in them as well say 100E-150E or V300-V400. at the moment I am stripping the prefix and suffix out in the Excel file and running the statement below and then concatenating the Character back on in another datastep. Many thanks again for your brilliant answer.
data Code_Range1 ;
set Code_Range ;
Begin_code = scan( code,1) ;
end_code = scan( code, -1) ;
len_indiv =max(length( begin_code), length(end_code) ) ;
do Indiv_C= Begin_Code to End_code;
Length indiv_code $15 ;
Indiv_code = putn( indiv_c, 'Z', len_indiv ) ;
output ;
end;
Drop begin_code end_code len_indiv indiv_c ;
run;
Lisa
where you have the variety of naming ranges you describe, just create a code block for each variety and if-then-else-if your way through
libname your (work) ;
data your.data ;
input code $char40. ;
list ; cards;
12345a-12347a
b432 -b434
2345
;
* not exhaustive testing ;
data results ;
set your.data ;
length indiv_code $20 ;
array part(3) $20 ; * defining array of 3 $20 string vars named part1 - part3 ;
do i = 1 to 3 ;
part(i) = scan( code, i) ;
end ;
if part3 ne ' ' then do ;
* code block for handling 3+ parts in the code ;
end ;
else
if part2 = ' ' then do ;
* code block to deal with single code ;
end ;
else do ;
* handling CODE in 2 parts ;
* first check for numeric-only strings ;
if not verify( part1 !! part2, ' 1234567890' ) then do ;
* code block for handling a pair of numeric strings as developed earlier ;
end ;
else do ;
* handle CODE with parts that are not entirely numeric ;
length c1-c4 $1 ;
c1 = part1 ; * first char of string ;
c2 = part2 ;
If verify( c1 !! c2, '1234567890' ) then do ;
*handle non-numeric prefix ;
if c1 = c2 then do ;
* handle common prefix in part1 and 2 ;
startNpos1 = anydigit( part1 ) ;
startNpos2 = anydigit( part2 ) ;
prefix1 = substr( part1, 1, startNpos1 -1 ) ;
prefix2 = substr( part2, 1, startNpos2 -1 ) ;
if prefix1 ne prefix2 then do ;
* handle dis-similar prefixes !! ;
end ;
else do ;
* handle SAME prefixes !! ;
begin_code = substr( part1, startNpos1 ) ;
end_code = substr( part2, startNpos2 ) ;
str_len = max( length( part1 ), length( part2) ) - length( prefix1 ) ;
do indiv_c = begin_code to end_code ;
indiv_code = trim( prefix1 ) !! putN( indiv_c, 'Z', str_len ) ;
output ;
end ;
end ;
end /* prefix starts with same char */ ;
else do ;
* how do you handle cases where prefix is not the same? ;
end ;
end /* part starts non-numeric */ ;
else do ;
* check for parts with non-numeric after the start
like 12345axx-12348axx ;
* establish where suffix starts ;
sufStart1 = verify( part1, '1234567890' ) ;
sufStart2 = verify( part2, '1234567890' ) ;
* establish suffix ;
suf1 = substr( part1, sufStart1 ) ;
suf2 = substr( part2, sufStart2 ) ;
if suf1 ne suf2 then do ;
* handle differing suffixes ;
end ;
else do ;
* handle where suffixes are equal ;
begin_code = substr( part1, 1, sufStart1 -1 ) ;
end_code = substr( part2, 1, sufStart1 -1 ) ;
str_len = max( length( part1 ), length( part2) ) - length( suf1 ) ;
do indiv_c = begin_code to end_code ;
indiv_code = putN( indiv_c, 'Z', str_len ) !! suf1 ;
output ;
end ;
end ;
end /* part contains non-numeric after the start */;
end /* CODE not entirely numeric */ ;
end /* HANDLING CODE IN 2 PARTS */;
run ;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.