How to change a variable's length without change its position? Is there any simple way?
Thanks!
/*Here is just an example I use sashelp.class,in fact my dataset has hundreds of variables.the varialbe which I want to change the length locates in the middle.*/
/* In the example below,I don't want to change SEX's position(second postion)*/
data one;
length SEX $15.;
set sashelp.class;
SEX='Testing 12345';
run;
Editor's Note: This is a popular topic. Thanks also to @PGStats and others for their contributions.
Depends upon what you call simple. How about:
proc sql noprint;
select name
into :names
separated by " "
from dictionary.columns
where libname="SASHELP" and
memname="CLASS"
;
quit;
data one;
retain &names.;
length SEX $15.;
set sashelp.class;
SEX='Testing 12345';
run;
I don't think there is any neat way. Here is a half-baked solution (ab)using proc sql -- I would appreciate any corrections or modifications. Thanks for the interesting question!
/* lengthen sex to $15 without changing the varnum.
NOTE: This will bomb when the var Sex is the first or the last var. */
proc sql noprint;
create view cols as select name, varnum
from dictionary.columns
where libname = "SASHELP" and memname = "CLASS";
select varnum into :sexNum
from cols where name = "Sex";
select name into :before separated by ", "
from cols where varnum < &sexNum;
select name into :after separated by ", "
from cols where varnum > &sexNum;
create table class as
select &before, "Testing 12345" as Sex length=15, &after
from sashelp.class;
quit;
/* check */
proc contents data=class varnum;
run;
/* on lst -- in part
Variables in Creation Order
# Variable Type Len
1 Name Char 8
2 Sex Char 15
3 Age Num 8
4 Height Num 8
5 Weight Num 8
*/
To change the length,we must use the LENGTH statement as the very first statement in the DATA STEP ,
BUT the position will be changed at the same time.
Thank you chang_y_chung.your method works well ,let's wait for somebody have a more simple method.
Editor's Note: This is a popular topic. Thanks also to @PGStats and others for their contributions.
Depends upon what you call simple. How about:
proc sql noprint;
select name
into :names
separated by " "
from dictionary.columns
where libname="SASHELP" and
memname="CLASS"
;
quit;
data one;
retain &names.;
length SEX $15.;
set sashelp.class;
SEX='Testing 12345';
run;
Simpler :
data test;
a = 1; b = "some text"; c = 3;
run;
proc sql;
alter table test modify b character(12);
quit;
PG
@PGStats: Is there a way to do that, as you have shown, without modifying the original table?
Well I guess if you don't want to modify the original table, you must make a copy. I don't see how this could be done dynamically (with a view) without naming all the variables.
data test;
a = 1; b = "some text"; c = 3;
run;
proc sql;
create table myTest as select * from test;
alter table myTest modify b character(12);
quit;
PG
That would definitely work. Now the question is which is more efficient: using dictionary.columns or creating and modifying a table using the alter statement.
Interesting question!
Art
Interesting indeed. It all depends on which efficiency matters most in a given circumstance. For a one shot deal, involving a moderate size table, the ALTER TABLE is quite efficient (maximizing coder efficiency). For greater data volume or frequency, I believe (in principle, not tested) the most efficient approach would be to create a SQL view like this :
data test;
a = 1; b = "some text"; c = 3; d = "To be truncated";
run;
data newLength;
input name :$UPCASE32. newLength;
datalines;
b 15
d 10
;
proc sql noprint;
select case when missing(newLength) then d.name else catt(d.name, " length=", newLength) end
into :names
separated by ","
from dictionary.columns as d left join newLength as n
on upcase(d.name)=n.name
where libname="WORK" and
memname="TEST"
;
create view myTest as
select &names. from test;
drop table newLength;
quit;
You could use that view to subset a large remote table in a very efficient way (maximizing machine efficiency).
PG
proc sql;
create table one as select * from sashelp.class;
alter table one modify sex character(20);
update one set sex='Testing 12345';
quit;
You could query the metadata and generate the code you need.
One method might be to generate a RETAIN statement (to set the order) only replacing the variable name that is changing length with a dummy name.
proc sql noprint ;
select varnum,case when (upcase(name)='SEX') then '_X_' else name end
into :dummy, :varlist separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS'
order by varnum
;
quit;
data want ;
retain &varlist;
set sashelp.class ;
length _x_ $20;
_x_=sex;
drop sex;
rename _x_=sex;
run;
Maybe this is too basic, but here's an arguably simpler method:
(1) note up to 4 variable names: first, prior, next, and last variables,
(2) split the data set at the item to be changed.
(3) restore the full data set by remerging the two halves on a shared unique record ID based on _N_.
DATA oneA (KEEP=uniquerec name/*--priorvariable*/ sexB )
oneB (KEEP=uniquerec age--weight ) ;
/*1st( to n-1th) nth n+1th--last ... where n=position of var. to be changed */
SET sashelp.class;
LENGTH sexB $15.; *sexB: will be longer version of variable sex;
sexB=sex; *populate longer sexb with original sex values;
uniquerec = _N_;
DATA newclass (DROP=uniquerec);
MERGE oneA oneB; BY uniquerec;
RENAME sexB=sex;
RUN; *Viola? ;
More changed variables just require more "splits" of the data to contiguous chunks oneC, oneD, etc., followed by their re-merge on uniquerec.
Actually you have made it much more complex than it needs to be. You just need to know the name of the variable you want to change and the name of the LAST variable in the data set. You can get that by querying the output of PROC CONTENTS or DICTIONARY.COLUMNS (or sashelp.vcolumn). So for example to change the length of the AGE variable in SASHELP.CLASS you could do this.
%let var=AGE ;
%let lastvar = WEIGHT ;
data want ;
set sashelp.class(drop = &var -- &lastvar) ;
length &var 4 ;
set sashelp.class ;
run;
proc compare data=sashelp.class compare=want ;
run;
Note that the same syntax will also work if the variable is the FIRST variable in input data set. In that case the first SET statement will read 0 variables from the source.
%let var=NAME ;
%let lastvar = WEIGHT ;
data want ;
set sashelp.class(drop = &var -- &lastvar) ;
length &var $30 ;
set sashelp.class ;
run;
proc compare data=sashelp.class compare=want ;
run;
There are several statements as starting situation.
1/ it is not possible to change a variable length (char type) without the need to change the physical storage. This is due to the relation dbms row design.
2/ SAS did a redesign changing the physical order segregating char and number types when going from 8 to 9.
3/ SAS did not implement any interface for maintaining a physical order better to say the view to that physical order as of the V9 version is using that. It could be an easy enhancement.
I do not see the need for the question of ordering variables in a physical or derer as that does not exist with the rdbms world. I assume it is some ease of coding. Would be better to go for good coding practices.
1- Yes. That is why it is hard.
2- What are you talking about? I couldn't care a wit whether SAS wants to STORE the data in the physical file in what ever order it wants as long as it understands the logical order for them so that it can present them back to me when I run a PROC PRINT.
3- Again what are you talking about? SAS rules for order of variables in a table is the same as it has ever been. The order is defined by the order that the compiler sees the fields being defined. So if you write DATA NEW; SET OLD: RUN: then all of the variable in NEW will be defined in the same order as they were in OLD. or if you write DATA NEW; format vist_date date9. ; set old; run; now the variable VISIT_DATE is the first variable in the table NEW.
The question of the ORDERING of variables can be very critical to the ability to easily use a database. It is extremely annoying to browse the data and not see the key variables in the front. Or worse have a bunch of totally empty LOOONG character variables at the front so that you have to wade through screens of gibberish to figure out what you are looking at. Or to have the data fields appear in jumbled order. Your doctor will be very confused if place DIASTOLIC before SYSTOLIC instead of the reverse as is normally used when reporting blood pressure readings.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.