Hi,
I am struggling with the macro codes to dynamically extract characters of the values from specific columns and put into newly created columns.
My actual data has tens of columns to be converted. Here I've made a sample data with 3 representative columns:
data oriData;
input ID aab $7. bbc $7.;
cards;
1 0outof1 0outof0
2 0outof0 0outof1
3 0outof0 0outof0
4 0outof1 0outof1
5 0outof0 0outof1
;
run;
My goal is to convert the sample data into the one like the following data set:
data finalData;
input ID BC_aab LC_aab BC_bbc LC_bbc;
cards;
1 0 1 0 0
2 0 0 0 1
3 0 0 0 0
4 0 1 0 1
5 0 0 0 1
;
run;
Here are the codes I prepared to do this conversion:
/* Collect column information */
proc sql;
create table vartable as
select * from sashelp.vcolumn
where libname='WORK' and memname="ORIDATA";
quit;
/* Make a list of column names to be converted */
data varlist;
set vartable;
where name not in ("ID");
keep name;
run;
/* Pass the column names into macro variables */
proc sql noprint;
select name
into :vname1 -
from work.varlist;
%let numobs=&sqlobs;
quit;
/* Examine the macro variables generated */
%put numobs= &sqlobs;
%put vname1= &vname1.;
%put vname2= &vname2.;
%put vname3= &vname3.;
/* Macro for the conversion */
%macro trfm(input,output);
data &output;
set &input;
%do i= 1 %to &numobs.;
BC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,1,1),8.));
LC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,%length(&&vname&i),1),8.));
drop &&vname&i.;
%end;
run;
%mend;
/* Testing */
%trfm(oriData,finalData);
Here is the log after running the code:
1921 data oriData;
1922 input ID aab $7. bbc $7.;
1923 cards;
NOTE: The data set WORK.ORIDATA has 5 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
1929 ;
1930 run;
1931
1932 proc sql;
1933 create table vartable as
1934 select * from sashelp.vcolumn
1935 where libname='WORK' and memname="ORIDATA";
NOTE: Table WORK.VARTABLE created, with 3 rows and 18 columns.
1936 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
1937
1938 data varlist;
1939 set vartable;
1940 where name not in ("ID");
1941 keep name;
1942 run;
NOTE: There were 2 observations read from the data set WORK.VARTABLE.
WHERE name not = 'ID';
NOTE: The data set WORK.VARLIST has 2 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
1943
1944 proc sql noprint;
1945 select name
1946 into :vname1 -
1947 from work.varlist;
1948 %let numobs=&sqlobs;
1949 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
1950
1951 %put numobs= &sqlobs;
numobs= 2
1952 %put vname1= &vname1.;
vname1= aab
1953 %put vname2= &vname2.;
vname2= bbc
1954 %put vname3= &vname3.;
vname3= ccd
1955
1956 %macro trfm(input,output);
1957 data &output;
1958 set &input;
1959 %do i= 1 %to &numobs.;
1960 BC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,1,1),8.));
1961 LC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,%length(&&vname&i),1),8.));
1962 drop &&vname&i.;
1963 %end;
1964 run;
1965 %mend;
1966
1967 %trfm(oriData,finalData);
MPRINT(TRFM): data finalData;
MPRINT(TRFM): set oriData;
MPRINT(TRFM): BC_aab= a ;
MPRINT(TRFM): LC_aab= b ;
MPRINT(TRFM): drop aab;
MPRINT(TRFM): BC_bbc= b ;
MPRINT(TRFM): LC_bbc= c ;
MPRINT(TRFM): drop bbc;
MPRINT(TRFM): run;
NOTE: Variable a is uninitialized.
NOTE: Variable b is uninitialized.
NOTE: Variable c is uninitialized.
NOTE: There were 5 observations read from the data set WORK.ORIDATA.
NOTE: The data set WORK.FINALDATA has 5 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
The problem is that the program can't do extraction from the value of macro variables. Instead, it did extraction from the text of the macro variables. I couldn't find the solution, and ask for help from the community. Thanks in advance!!!
You may be picturing an overly complex process here. Let's start by combining your first two steps:
proc sql;
create table vartable as
select * from sashelp.vcolumn
where libname='WORK' and memname="ORIDATA"
and upcase(name) ne 'ID';
quit;
At this point, the program has all the information it needs to construct a DATA step holding the solution. This is untested code, but should be in the ballpark:
data _null_;
set vartable end=done;
if _n_=1 then call execute('data finalData; set oriData;');
call execute ('length BC_' || name || 'LC_' || name || '$ 1; drop ' || name || ';' ) ;
call execute ('BC_' || name || '=' || name || ';' ) ;
call execute ('LC_' || name || '= substr(' || name || ', length(' || name || ') ) ;' );
if done then call execute('run;') ;
run;
You can always turn it into a macro that uses parameters for the input and output data sets. In that case, be sure to use double quotes instead of single quotes for the first CALL EXECUTE.
I defined the first and last characters as character variables with a length of 1. That supports shortcuts, such as the equivalent of:
length LC_aab $ 1;
LC_aab = aab;
There's only room to store one character, so the value gets truncated automatically.
Give it a shot, see if it needs a tweak or not.
You may be picturing an overly complex process here. Let's start by combining your first two steps:
proc sql;
create table vartable as
select * from sashelp.vcolumn
where libname='WORK' and memname="ORIDATA"
and upcase(name) ne 'ID';
quit;
At this point, the program has all the information it needs to construct a DATA step holding the solution. This is untested code, but should be in the ballpark:
data _null_;
set vartable end=done;
if _n_=1 then call execute('data finalData; set oriData;');
call execute ('length BC_' || name || 'LC_' || name || '$ 1; drop ' || name || ';' ) ;
call execute ('BC_' || name || '=' || name || ';' ) ;
call execute ('LC_' || name || '= substr(' || name || ', length(' || name || ') ) ;' );
if done then call execute('run;') ;
run;
You can always turn it into a macro that uses parameters for the input and output data sets. In that case, be sure to use double quotes instead of single quotes for the first CALL EXECUTE.
I defined the first and last characters as character variables with a length of 1. That supports shortcuts, such as the equivalent of:
length LC_aab $ 1;
LC_aab = aab;
There's only room to store one character, so the value gets truncated automatically.
Give it a shot, see if it needs a tweak or not.
Fantastic! It's a very concise and efficient solution! I can easily use it to change it into a macro program. Combined with a driver table which has the information for all macro variable, I can do the calculation on a big data. Thanks a lot!
Assuming there is only two columns split for aab bbc .
data oriData;
input ID aab $ bbc $;
cards;
1 0outof1 0outof0
2 0outof0 0outof1
3 0outof0 0outof0
4 0outof1 0outof1
5 0outof0 0outof1
;
run;
data want;
set oridata;
BC_aab=scan(aab,1,,'kd');
LC_aab=scan(aab,-1,,'kd');
BC_bbc=scan(bbc,1,,'kd');
LC_bbc=scan(bbc,-1,,'kd');
drop aab bbc;
run;
proc print;run;
Thanks for your answer!
I am working on a much more complicated data set than the sample data I presented for this question. It is a data driven coding issue. The point of my question I posted is actually about the two lines inside the macro %trfm:
BC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,1,1),8.));
LC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,%length(&&vname&i),1),8.));
These two lines do not give the results I want. They substr the text of '&&vname&i' which is 'aab' or 'bbc' (the name of column), not the value of '&&vname&i' which is '0outof1'...(the observations of the column). I was struggling at this point for those two lines above. Is there any way to modify the codes and make it work?
@leehsin wrote:
Thanks for your answer!
I am working on a much more complicated data set than the sample data I presented for this question. It is a data driven coding issue. The point of my question I posted is actually about the two lines inside the macro %trfm:
BC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,1,1),8.));
LC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,%length(&&vname&i),1),8.));
These two lines do not give the results I want. They substr the text of '&&vname&i' which is 'aab' or 'bbc' (the name of column), not the value of '&&vname&i' which is '0outof1'...(the observations of the column). I was struggling at this point for those two lines above. Is there any way to modify the codes and make it work?
Which is why the proposed solutions did not involve trying to do this with macro code. Usually when I see code that is attempting to store a series of values into macro variables my immediate reaction is that those values should have been left in a dataset.
To get an answer to this specific question post some example values of the macro variables involved. What is &I? What is VNAME1?
It sounds like you think that VNAME1 has a value like AAB and there is a macro variable named AAB that has a value like 0outof1.
188 %let i=1; 189 %let vname1=aab; 190 %let aab=0outof1; 191 %put '&vname1' = &vname1; '&vname1' = aab 192 %put '&&vname&i' = &&vname&i ; '&&vname&i' = aab 193 %put '&&&&&&vname&i' = &&&&&&vname&i; '&&&&&&vname&i' = 0outof1
So you need many more &'s to get that resolved. To see why use SYMBOLGEN option.
194 options symbolgen; 195 %put '&&&&&&vname&i' = &&&&&&vname&i; SYMBOLGEN: && resolves to &. SYMBOLGEN: && resolves to &. SYMBOLGEN: && resolves to &. SYMBOLGEN: Macro variable I resolves to 1 SYMBOLGEN: && resolves to &. SYMBOLGEN: Macro variable VNAME1 resolves to aab SYMBOLGEN: Macro variable AAB resolves to 0outof1 '&&&&&&vname&i' = 0outof1 196 options nosymbolgen;
NOTE: don't leave SYMBOLGEN on if you are really running code like that or else your SAS log will have pages and pages of these messages.
To avoid having to type so many &'s in your code just pull out the values into other macro variables.
%let vname=&&vname&i;
%let value=&&&vname;
%put &=vname &=value ;
Thank you for your answer, Tom.
It seems to be complicated to use multiple '&' to resolve the macro variable. I couldn't find the rule using 5 '&'s to resolve the macro variable into '0outof1'. And if I use '&&&&&&vname&i' in my code, it still does not work.
Why would you use 5 &?
1 %let i=1; 2 %let vname1=abc; 3 %let abc=0outof1; 4 %put 1 &vname&i; WARNING: Apparent symbolic reference VNAME not resolved. 1 &vname1 5 %put 2 &&vname&i; 2 abc 6 %put 3 &&&vname&i; 3 abc 7 %put 4 &&&&vname&i; 4 abc 8 %put 5 &&&&&vname&i; 5 abc 9 %put 6 &&&&&&vname&i; 6 0outof1
There is no need for extensive macro code for this problem. Only useful place would be a macro variable that has the list of variables that you want to transform.
One of the best tools for generating variable names from data is PROC TRANSPOSE.
data oriData;
input ID aab :$7. bbc :$7.;
cards;
1 0outof1 0outof0
2 0outof0 0outof1
3 0outof0 0outof0
4 0outof1 0outof1
5 0outof0 0outof1
;
data tall ;
set oridata;
array x aab bbc ;
length _name_ $32 type $2 value $1 ;
do i=1 to dim(x);
_name_ = vname(x[i]);
type='BC';
value=substr(x[i],1,1);
output;
type='LC';
value=substr(x[i],length(x[i]));
output;
end;
run;
proc transpose data=tall out=want delim=_;
by id;
id type _name_;
var value ;
run;
Result:
Obs ID _NAME_ BC_aab LC_aab BC_bbc LC_bbc 1 1 value 0 1 0 0 2 2 value 0 0 0 1 3 3 value 0 0 0 0 4 4 value 0 1 0 1 5 5 value 0 0 0 1
Tom, this is a another excellent solution! It also automatically generates the new variable names using 'type'. Thank you so much! I appreciate it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.