Hello,
I would like to assign max length for common variable for respective data set ,if variable present in data set
Example : In a library we have 5 data sets A , B , C, D, E,F and XX_VAR is present in only A , B , E data sets and assign max length of this var across all data three data sets
IF max XX_VAR length 150 in B data set and need to assign same length to across all data sets which have that variable
This process will be for all vars in all data set in respective library
Yes:
options dlcreatedir;
libname test "%sysfunc(pathname(WORK))/test";
data test.A(label = "this is A");
x = "123";
z = 6;
y = "12345";
t = 17;
run;
data test.B(label = "this is B");
x = "12345";
z = "123";
u = 42;
run;
data test.C(label = "this is C");
z = "12345";
y = "123";
v = 303;
x = 1001;
run;
/* collect metadata */
proc sql;
create table metadata1 as
select
memname,
name,
varnum,
type,
case when type = 'char' then "$"
else " "
end as t,
length,
max(length) as l
from dictionary.columns
where libname = "TEST"
group by name, type
order by memname, varnum, name, type
;
create table metadata2 as
select memname, memlabel
from dictionary.tables
where libname = "TEST"
order by memname
;
create table metadata3 as
select a.*, b.memlabel
from
metadata1 as a
left join
metadata2 as b
on a.memname = b.memname
;
quit;
/* set max */
data _null_;
set metadata3;
by memname;
if first.memname then
do;
call execute (cats("data TEST.",memname,"(label=",quote(strip(memlabel)),");"));
call execute ("length ");
end;
call execute (catx(" ", name, t, l));
if l ne length then put "NOTE: variable " name "changed length from " length " to " l;
if last.memname then
do;
call execute (cats("; set TEST.",memname,"; run;"));
end;
run;
Bart
Hi,
Try this:
options dlcreatedir;
libname test "%sysfunc(pathname(WORK))/test";
data test.A;
x = "123";
y = "12345";
t = 17;
run;
data test.B;
x = "12345";
z = "123";
u = 42;
run;
data test.C;
z = "12345";
y = "123";
v = 303;
run;
/* collect metadata */
proc sql;
create table metadata as
select memname, name, max(length) as l
from dictionary.columns
where libname = "TEST"
and type = "char" /* only text variables */
group by name
order by memname, name
;
quit;
/* set max */
data _null_;
set metadata;
by memname;
if first.memname then
do;
call execute (cats("data TEST.",memname,";"));
call execute ("length ");
end;
call execute (cat(name, " $ ", l));
if last.memname then
do;
call execute (cats("; set TEST.",memname,"; run;"));
end;
run;
Bart
Yes:
options dlcreatedir;
libname test "%sysfunc(pathname(WORK))/test";
data test.A(label = "this is A");
x = "123";
z = 6;
y = "12345";
t = 17;
run;
data test.B(label = "this is B");
x = "12345";
z = "123";
u = 42;
run;
data test.C(label = "this is C");
z = "12345";
y = "123";
v = 303;
x = 1001;
run;
/* collect metadata */
proc sql;
create table metadata1 as
select
memname,
name,
varnum,
type,
case when type = 'char' then "$"
else " "
end as t,
length,
max(length) as l
from dictionary.columns
where libname = "TEST"
group by name, type
order by memname, varnum, name, type
;
create table metadata2 as
select memname, memlabel
from dictionary.tables
where libname = "TEST"
order by memname
;
create table metadata3 as
select a.*, b.memlabel
from
metadata1 as a
left join
metadata2 as b
on a.memname = b.memname
;
quit;
/* set max */
data _null_;
set metadata3;
by memname;
if first.memname then
do;
call execute (cats("data TEST.",memname,"(label=",quote(strip(memlabel)),");"));
call execute ("length ");
end;
call execute (catx(" ", name, t, l));
if l ne length then put "NOTE: variable " name "changed length from " length " to " l;
if last.memname then
do;
call execute (cats("; set TEST.",memname,"; run;"));
end;
run;
Bart
Why do you have different lengths in the first place? You're better off fixing your data import process.
@raja777pharma wrote:
Hello,
I would like to assign max length for common variable for respective data set ,if variable present in data set
Example : In a library we have 5 data sets A , B , C, D, E,F and XX_VAR is present in only A , B , E data sets and assign max length of this var across all data three data sets
IF max XX_VAR length 150 in B data set and need to assign same length to across all data sets which have that variable
This process will be for all vars in all data set in respective library
I would also suggest looking into WHY that occurs. If the variables should have a common length then likely the approach being used to read the data is flawed, such as relying on Proc Import which makes separate guesses each time a set is read.
You will need additional code that examines the metadata of each data set and selects the longest length of each variable across all the data sets.
A macro that computes (code gens) the variable declarations using a variety of steps, such as PROC CONTENTS and SQL, can be coded and submitted from a %sysfunc(doSubL( function call.
For the case of a variable having different types (such as ID numeric in one data set and character in another) the code gen will work, but the final DATA step will ERROR as it should.
Consider the two typical cases
DATA want;
LENGTH <vardecl: vars from all data sets listed with max lengths>; SET data sets ...
and
DATA want;
LENGTH <vardecl: vars from all data sets listed with max lengths>; MERGE data sets ...
Rather than deal with the list of data sets twice, or using yet another macro var to hold them, code a macro that accepts the list of data sets in a parameter and uses that parameter for computing the variable declarations and for a SET/MERGE statement.
Example:
options mprint; data A; id=1; length S $5; X=1; length t $5; data B; Id=1; length s $15; Y=1; length L $9; data C; ID=1; length S $7; Z=1; length t $7; data D; iD=1; length S $21; P=1; data E; id=1; length s $32; Q=1; length T $12; run; DATA want; %Magic(MERGE, A B C D E); by id; run;
What does %MAGIC do ? Runs multi step computation launched via doSubL and utilizes a computed value.
%macro Magic(operation, data); %* operation: SET or MERGE; %* data: list of data sets (that do not contain data set options); %local rc; %global vardecl; %* empirically determined global is needed. If local, the var does not get updated by doSubL side session; %let rc = %sysfunc(doSubL(%nrstr( %ComputeVarDecl(&data, vardecl) /* pass in list of data sets and name of variable to contain code gen */ ))); LENGTH &vardecl; %* <--------- utilize computed code gen; &operation &data; %symdel vardecl; %mend;
So, What does the guts ComputeVarDecl look like ?
%macro ComputeVarDecl(data, mvar_vardecl); %* data: list of data sets; %* mvar_vardecl: name of macro symbol to receive the computed code gen for LENGTH statement; %local i zi; %do i = 1 %to %sysfunc(COUNTW(&data,%str( ))); %put &=i; %let zi = %sysfunc(PUTN(&i,z8.)); proc contents noprint data=%sysfunc(SCAN(&data,&i,%str( ))) out=_contents_&zi(keep=libname memname varnum name type length) ; run; %end; data _contents_all; length upname $32; set _contents_0: indsname=source; if lag(source) ne source then dsnum + 1; upname = upcase(name); run; proc sql noprint; select catx(' ', name, case when (type=1) then ' ' else '$' end, max(length) %* <---- longest length; ) into :&mvar_vardecl separated by ' ' %* <---- populate vardecl; from _contents_all group by upname having dsnum=min(dsnum) order by dsnum, varnum ;
proc datasets noprint;
delete _contents_: / mt=data; %mend;
Nice DOSUBL @RichardDeVen .
Just a small note, re creating the global macro var VARDECL inside of %MAGIC:
%global vardecl; %* empirically determined global is needed. If local, the var does not get updated by doSubL side session;
You can use a local macro variable there. The problem is, in order to have the DOSUBL side session return a value to it, you need to create a same-named global macro variable in the side session, and if you use %GLOBAL statement in the side session it errors (because it mistakenly thinks it's trying to declare a preexisting local variable as global). But you can use an open %LET in the side session to create a side-session global macro var, e.g.:
%local vardecl ;
%let rc = %sysfunc(doSubL(%nrstr(
%let vardecl= ; %*Create side-session global macro var, which will be returned to main session local macro var with the same name;
%ComputeVarDecl(&data, vardecl)
)));
Perhaps a bit of a hack, but it allows you to avoid mucking with the main session global symbol table.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.