I've a large data set (Data A, easily > 500K rows) with a long list of conditions (A01...A38, B02... B75, C04..... S99). I want to add variable description using an external data set (Data B), with a corresponding list of variable names and description.
I found a SAS code but it works on occasions, e.g. on A01-B20, but not when I tried to include all variables. I don't know why. Also I'm hoping the code can work efficiently as my data can be large.
The sample code is as below. The actual label data and the data to be applied (w/ 100 cases) are attached so that you can test the macro %varlabel on them, and it may help to see why the macro works on the sample data but not the actual data
. Thanks. Or any other SAS codes that would work for my purpose would do too. Thanks in advance.
data labeldata; input var $ 1-3 labels $ 5-17;
datalines;
A01 Diebetes
A02 Heart disease
B01 Cancer
B02 Renal failure
;
data mydata; input ID 1 A01 3 A02 5 B01 7 B02 9;
datalines;
1 1 1 1 0
2 0 0 0 1
3 1 0 1 1
4 1 0 0 1
;
run;
%macro varlabel(indata=, lib=, labeldata=, labelvar=, labeldesc=);
data _null_;
set &labeldata;
call symput("var" || trim(left(_N_)), trim(left(&labelvar)));
call symput("label" || trim(left(_N_)), trim(left(&labeldesc)));
call symput("nobs", trim(left(_N_)));
run;
proc datasets library = &lib memtype = data nolist;
modify &indata;
label
%do i = 1 %to &nobs;
&&var&i = &&label&i
%end; ;
quit;
run;
%mend;
%varlabel(indata=mydata, lib=work, labeldata=labeldata, labelvar=var, labeldesc=labels);
proc contents data=mydata order=varnum; run;
Thanks a lot. Indeed that's the reason. The code works now, with the fixes.
%macro varlabel(indata=, lib=, labeldata=, labelvar=, labeldesc=);
* create distinct macro variables for each variable name and label;
data _null_;
set &labeldata;
call symput("var" || trim(left(_N_)), trim(left(&labelvar)));
call symput("label" || trim(left(_N_)), trim(left(&labeldesc)));
call symput("nobs", trim(left(_N_)));
run;
proc datasets library = &lib memtype = data nolist;
modify &indata;
label
%do i = 1 %to &nobs;
&&var&i = "&&label&i"
%end; ;
quit;
run;
%mend;
%varlabel(indata=mydata, lib=work, labeldata=labeldata, labelvar=var, labeldesc=labels);
Thanks a lot. Indeed that's the reason. The code works now, with the fixes.
%macro varlabel(indata=, lib=, labeldata=, labelvar=, labeldesc=);
* create distinct macro variables for each variable name and label;
data _null_;
set &labeldata;
call symput("var" || trim(left(_N_)), trim(left(&labelvar)));
call symput("label" || trim(left(_N_)), trim(left(&labeldesc)));
call symput("nobs", trim(left(_N_)));
run;
proc datasets library = &lib memtype = data nolist;
modify &indata;
label
%do i = 1 %to &nobs;
&&var&i = "&&label&i"
%end; ;
quit;
run;
%mend;
%varlabel(indata=mydata, lib=work, labeldata=labeldata, labelvar=var, labeldesc=labels);
Instead of creating a bunch of macro variables and referencing issues you might consider CALL EXECUTE statements instead.
Something like:
data _null_; set &labeldata end=last; if _n_= 1 then do; call execute( "proc datasets library=&lib. memtype=data nolist;"); call execute( "modify &indata.;"); call execute( "label"); end; call execute (var||'=' quote(strip(label))); if last call execute("quit;"); end;
And if your datalabel set actually had separate datasets then sorting the data by the dataset names, using BY in the data _null_ with first.datasetname could be used to insert the modify statement for each data set, though you would use the variable name instead of &indata macro variable and the statement would be more like call execute ("modify "|| datasetnamevar ||";");
Putting the metadata into macro variables is just going to make it harder to deal. Leave data in datasets.
To add labels you want to generate label statement(s).
For that you just need your metadata file:
data labeldata;
infile cards truncover ;
input name :$32. label $255. ;
cards;
A01 Diabetes
A02 Heart disease
B01 Cancer
B02 Renal failure
;
It is probably easiest to write the label statement o a file than try to generate strings to push onto the program stack with CALL EXECUTE().
filename labels temp;
data _null_;
set labeldata end=eof ;
file labels;
if _n_=1 then put 'label';
put @3 name '=' label :$quote. ;
if eof then put ';' ;
run;
Then you just need to use %INCLUDE to use that code to modify the dataset.
proc datasets nolist libname=work;
modify mydata;
%include labels ;
run;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.