BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
leehsin
Quartz | Level 8

Hi,

 

I am struggling with the macro codes to dynamically extract characters of the values from specific columns and put into newly created columns.

 

My actual data has tens of columns to be converted. Here I've made a sample data with 3 representative columns:

data oriData;
input ID aab $7. bbc $7.;
cards;
1 0outof1 0outof0
2 0outof0 0outof1
3 0outof0 0outof0
4 0outof1 0outof1
5 0outof0 0outof1
;
run;

My goal is to convert the sample data into the one like the following data set:

data finalData;
input ID BC_aab LC_aab BC_bbc LC_bbc;
cards;
1 0 1 0 0
2 0 0 0 1
3 0 0 0 0
4 0 1 0 1
5 0 0 0 1
;
run;

Here are the codes I prepared to do this conversion:

/* Collect column information */
proc sql;
create table vartable as
select * from sashelp.vcolumn
where libname='WORK' and memname="ORIDATA";
quit;

/* Make a list of column names to be converted */
data varlist;
set vartable;
where name not in ("ID");
keep name;
run;

/* Pass the column names into macro variables */
proc sql noprint;
select name
into :vname1 -
from work.varlist;
%let numobs=&sqlobs;
quit;

/* Examine the macro variables generated */
%put numobs= &sqlobs;
%put vname1= &vname1.;
%put vname2= &vname2.;
%put vname3= &vname3.;

/* Macro for the conversion */
%macro trfm(input,output);
data &output;
set &input;
%do i= 1 %to &numobs.;
BC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,1,1),8.));
LC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,%length(&&vname&i),1),8.));
drop &&vname&i.;
%end;
run;
%mend;

/* Testing */
%trfm(oriData,finalData);

Here is the log after running the code: 

 

1921  data oriData;
1922  input ID aab $7. bbc $7.;
1923  cards;


NOTE: The data set WORK.ORIDATA has 5 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


1929  ;
1930  run;
1931
1932  proc sql;
1933     create table vartable as
1934        select * from sashelp.vcolumn
1935        where libname='WORK' and memname="ORIDATA";
NOTE: Table WORK.VARTABLE created, with 3 rows and 18 columns.

1936  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

 

1937
1938  data varlist;
1939      set vartable;
1940      where name not in ("ID");
1941      keep name;
1942  run;


NOTE: There were 2 observations read from the data set WORK.VARTABLE.
      WHERE name not = 'ID';

NOTE: The data set WORK.VARLIST has 2 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


 

1943
1944  proc sql noprint;
1945      select name
1946      into :vname1 -
1947      from work.varlist;
1948  %let numobs=&sqlobs;
1949  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
 

 

1950
1951  %put numobs= &sqlobs;
numobs= 2
1952  %put vname1= &vname1.;
vname1= aab
1953  %put vname2= &vname2.;
vname2= bbc
1954  %put vname3= &vname3.;
vname3= ccd
1955
1956  %macro trfm(input,output);
1957  data &output;
1958  set &input;
1959      %do i= 1 %to &numobs.;
1960          BC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,1,1),8.));
1961          LC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,%length(&&vname&i),1),8.));
1962          drop &&vname&i.;
1963      %end;
1964  run;
1965  %mend;

1966
1967  %trfm(oriData,finalData);
MPRINT(TRFM):   data finalData;
MPRINT(TRFM):   set oriData;
MPRINT(TRFM):   BC_aab= a ;
MPRINT(TRFM):   LC_aab= b ;
MPRINT(TRFM):   drop aab;
MPRINT(TRFM):   BC_bbc= b ;
MPRINT(TRFM):   LC_bbc= c ;
MPRINT(TRFM):   drop bbc;
MPRINT(TRFM):   run;


NOTE: Variable a is uninitialized.
NOTE: Variable b is uninitialized.
NOTE: Variable c is uninitialized.
NOTE: There were 5 observations read from the data set WORK.ORIDATA.
NOTE: The data set WORK.FINALDATA has 5 observations and 8 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

The problem is that the program can't do extraction from the value of macro variables. Instead, it did extraction from the text of the macro variables. I couldn't find the solution, and ask for help from the community. Thanks in advance!!!

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

You may be picturing an overly complex process here.  Let's start by combining your first two steps:

 

proc sql;
create table vartable as
select * from sashelp.vcolumn
where libname='WORK' and memname="ORIDATA"
and upcase(name) ne 'ID';
quit;

At this point, the program has all the information it needs to construct a DATA step holding the solution.  This is untested code, but should be in the ballpark:

 

data _null_;
   set vartable end=done;
   if _n_=1 then call execute('data finalData; set oriData;');
   call execute ('length BC_' || name || 'LC_' || name || '$ 1;  drop ' || name || ';' ) ;
   call execute ('BC_' || name || '=' || name || ';' ) ;
   call execute ('LC_' || name || '= substr(' || name || ', length(' || name || ') ) ;' );
   if done then call execute('run;') ;
run;

You can always turn it into a macro that uses parameters for the input and output data sets.  In that case, be sure to use double quotes instead of single quotes for the first CALL EXECUTE.

 

I defined the first and last characters as character variables with a length of 1.  That supports shortcuts, such as the equivalent of:

 

length LC_aab $ 1;

LC_aab = aab;

 

There's only room to store one character, so the value gets truncated automatically.

 

Give it a shot, see if it needs a tweak or not.

View solution in original post

9 REPLIES 9
Astounding
PROC Star

You may be picturing an overly complex process here.  Let's start by combining your first two steps:

 

proc sql;
create table vartable as
select * from sashelp.vcolumn
where libname='WORK' and memname="ORIDATA"
and upcase(name) ne 'ID';
quit;

At this point, the program has all the information it needs to construct a DATA step holding the solution.  This is untested code, but should be in the ballpark:

 

data _null_;
   set vartable end=done;
   if _n_=1 then call execute('data finalData; set oriData;');
   call execute ('length BC_' || name || 'LC_' || name || '$ 1;  drop ' || name || ';' ) ;
   call execute ('BC_' || name || '=' || name || ';' ) ;
   call execute ('LC_' || name || '= substr(' || name || ', length(' || name || ') ) ;' );
   if done then call execute('run;') ;
run;

You can always turn it into a macro that uses parameters for the input and output data sets.  In that case, be sure to use double quotes instead of single quotes for the first CALL EXECUTE.

 

I defined the first and last characters as character variables with a length of 1.  That supports shortcuts, such as the equivalent of:

 

length LC_aab $ 1;

LC_aab = aab;

 

There's only room to store one character, so the value gets truncated automatically.

 

Give it a shot, see if it needs a tweak or not.

leehsin
Quartz | Level 8

Fantastic! It's a very concise and efficient solution! I can easily use it to change it into a macro program. Combined with  a driver table which has the information for all macro variable, I can do the calculation on a big data. Thanks a lot!

Ksharp
Super User

Assuming there is only two columns split for aab bbc .

data oriData;
input ID aab $ bbc $;
cards;
1 0outof1 0outof0
2 0outof0 0outof1
3 0outof0 0outof0
4 0outof1 0outof1
5 0outof0 0outof1
;
run;
data want;
 set oridata;
 BC_aab=scan(aab,1,,'kd');
 LC_aab=scan(aab,-1,,'kd');
 BC_bbc=scan(bbc,1,,'kd');
 LC_bbc=scan(bbc,-1,,'kd');
 drop aab bbc;
run;

proc print;run;
leehsin
Quartz | Level 8

Thanks for your answer!

 

I am working on a much more complicated data set than the sample data I presented for this question. It is a data driven coding issue. The point of my question I posted is actually about the two lines inside the macro %trfm:

 

BC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,1,1),8.));
LC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,%length(&&vname&i),1),8.));

 

These two lines do not give the results I want. They substr the text of '&&vname&i' which is 'aab' or 'bbc' (the name of column), not the value of '&&vname&i' which is '0outof1'...(the observations of the column). I was struggling at this point for those two lines above. Is there any way to modify the codes and make it work?

 

 

Tom
Super User Tom
Super User

@leehsin wrote:

Thanks for your answer!

 

I am working on a much more complicated data set than the sample data I presented for this question. It is a data driven coding issue. The point of my question I posted is actually about the two lines inside the macro %trfm:

 

BC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,1,1),8.));
LC_&&vname&i= %sysfunc(inputc(%substr(&&vname&i,%length(&&vname&i),1),8.));

 

These two lines do not give the results I want. They substr the text of '&&vname&i' which is 'aab' or 'bbc' (the name of column), not the value of '&&vname&i' which is '0outof1'...(the observations of the column). I was struggling at this point for those two lines above. Is there any way to modify the codes and make it work?

 

 


Which is why the proposed solutions did not involve trying to do this with macro code. Usually when I see code that is attempting to store a series of values into macro variables my immediate reaction is that those values should have been left in a dataset.

 

To get an answer to this specific question post some example values of the macro variables involved.  What is &I? What is VNAME1?

It sounds like you think that VNAME1 has a value like AAB and there is a macro variable named AAB that has a value like 0outof1.

 

188  %let i=1;
189  %let vname1=aab;
190  %let aab=0outof1;
191  %put '&vname1' = &vname1;
'&vname1' = aab
192  %put '&&vname&i' = &&vname&i ;
'&&vname&i' = aab
193  %put '&&&&&&vname&i' = &&&&&&vname&i;
'&&&&&&vname&i' = 0outof1

So you need many more &'s to get that resolved. To see why use SYMBOLGEN option.

194  options symbolgen;
195  %put '&&&&&&vname&i' = &&&&&&vname&i;
SYMBOLGEN:  && resolves to &.
SYMBOLGEN:  && resolves to &.
SYMBOLGEN:  && resolves to &.
SYMBOLGEN:  Macro variable I resolves to 1
SYMBOLGEN:  && resolves to &.
SYMBOLGEN:  Macro variable VNAME1 resolves to aab
SYMBOLGEN:  Macro variable AAB resolves to 0outof1
'&&&&&&vname&i' = 0outof1
196  options nosymbolgen;

NOTE: don't leave SYMBOLGEN on if you are really running code like that or else your SAS log will have pages and pages of these messages.

To avoid having to type so many &'s in your code just pull out the values into other macro variables.

%let vname=&&vname&i;
%let value=&&&vname;
%put &=vname &=value ;
leehsin
Quartz | Level 8

Thank you for your answer, Tom.

 

It seems to be complicated to use multiple '&' to resolve the macro variable. I couldn't find the rule using 5 '&'s to resolve the macro variable into '0outof1'. And if I use '&&&&&&vname&i' in my code, it still does not work. 

 

Tom
Super User Tom
Super User

Why would you use 5 &?

1    %let i=1;
2    %let vname1=abc;
3    %let abc=0outof1;
4    %put 1 &vname&i;
WARNING: Apparent symbolic reference VNAME not resolved.
1 &vname1
5    %put 2 &&vname&i;
2 abc
6    %put 3 &&&vname&i;
3 abc
7    %put 4 &&&&vname&i;
4 abc
8    %put 5 &&&&&vname&i;
5 abc
9    %put 6 &&&&&&vname&i;
6 0outof1
Tom
Super User Tom
Super User

There is no need for extensive macro code for this problem. Only useful place would be a macro variable that has the list of variables that you want to transform.

 

One of the best tools for generating variable names from data is PROC TRANSPOSE.

data oriData;
input ID aab :$7. bbc :$7.;
cards;
1 0outof1 0outof0
2 0outof0 0outof1
3 0outof0 0outof0
4 0outof1 0outof1
5 0outof0 0outof1
;

data tall ;
 set oridata;
 array x aab bbc ;
 length _name_ $32 type $2 value $1 ;

 do i=1 to dim(x);
   _name_ = vname(x[i]);
   type='BC';
   value=substr(x[i],1,1);
   output;
   type='LC';
   value=substr(x[i],length(x[i]));
   output;
 end;
run;
proc transpose data=tall out=want delim=_;
  by id;
  id type _name_;
  var value ;
run;

Result:

Obs    ID    _NAME_    BC_aab    LC_aab    BC_bbc    LC_bbc

 1      1    value       0         1         0         0
 2      2    value       0         0         0         1
 3      3    value       0         0         0         0
 4      4    value       0         1         0         1
 5      5    value       0         0         0         1
leehsin
Quartz | Level 8

Tom, this is a another excellent solution! It also automatically generates the new variable names using 'type'. Thank you so much! I appreciate it.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1207 views
  • 2 likes
  • 4 in conversation