Hello everyone, I have this code that works in changing the length of a variable. I am doing this beacuse i want to combine datasets to one, so the variables should have same length.
i am going to stack 9 files (yr1, yr2, ...yr9) and i want them all to have the same character length for 3 types of variables (dupersid, duig, condidx). I can only do for each variable at a time. Is there a way I can circumvent this so that all the length of the 3 variables will adjusted in one data step? Thank you
/***********************************************************************
- change the 8-character dupersid to 10-character dupersid
by adding the panel number to DUPERSIDs of panel 22 year 2017
and panels 1-21 in the PLVE file
- there is no year variable in the PLVE file
- implement this change by using the LENGTH and STRIP functions as folows
*************************************************************************/
data vsfile ;
length dupersid $10;
set new.h36u20 (rename=(dupersid=t_dupersid));
if length(strip(t_dupersid))=8 then
dupersid=cats(put(panel,z2.), t_dupersid);
else dupersid = t_dupersid;
drop t_dupersid;
run;
/***this is what i try to do but does not work**/
data XX;
length dupersid $10 duid $7 condidx $13;
set meps.YRM2011 (rename=(dupersid=t_dupersid));
if length(strip(t_dupersid))=8 then
dupersid=cats(put(panel,z2.), t_dupersid);
else dupersid = t_dupersid;
drop t_dupersid;
(rename=(duid=t_duid));
if length(strip(t_duid))=8 then
duid=cats(put(panel,z2.), t_duid);
else duid = t_duid;
drop t_duid;
(rename=(condidx=t_condidx));
if length(strip(t_condidx))=8 then
condidx=cats(put(panel,z2.), t_condidx);
else condidx = t_condidx;
drop t_condidx;
run;
Log 127 data XX; 128 length dupersid $10 duid $7 condidx $13; 129 set meps.YRM2011 (rename=(dupersid=t_dupersid)); 130 if length(strip(t_dupersid))=8 then 131 dupersid=cats(put(panel,z2.), t_dupersid); 132 else dupersid = t_dupersid; 133 drop t_dupersid; 134 135 (rename=(duid=t_duid)); - 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 136 if length(strip(t_duid))=8 then 137 duid=cats(put(panel,z2.), t_duid); 138 else duid = t_duid; 139 drop t_duid; 140 141 (rename=(condidx=t_condidx)); - 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 142 if length(strip(t_condidx))=8 then 143 condidx=cats(put(panel,z2.), t_condidx); 144 else condidx = t_condidx; 145 drop t_condidx; 146 run;
Before embarking on a lot of work, when you say : "I am doing this beacuse i want to combine datasets to one, so the variables should have same length." the first question to answer is "Why are the variables created with different lengths to begin with?"
If your data sets that you want to combine come from Proc Import or an import wizard step that calls the procedure then that is the culprit.
While Proc Import can be useful a major drawback is each set processed is treated as a unique item. That can be fine for a single set but as I guess you have found out the results can be problematic when combining data. You can also have variables change from character to numeric and changing lengths will not help that at all. You can only "stack" character data with character and numeric with numeric.
Your errors with Rename occur because the data set option uses the () around the rename=(<variable name pairs>) to indicate that it is indeed a data set option. If you need to use a rename otherwise it is just: Rename thisvar=thatvar;.
If you can provide example data for TWO datasets with the variables you need to work with as data step code (about 5 records each to cover the basic values) and what you expect the result to be this should be possible.
I have a strong suspicion that there is no need for any of the renames unless the source variable actually has different names in the sets.
The basic approach would be something like:
data want; length dupersid $10 duid $7 condidx $13; set meps.set1 meps.set2 othrlib.someset indsname=setname ; <manipulation code goes here> run;
The INDSNAME option creates a variable, in this case Setname, that will have the name of the current data set contributing the record. The variable with have the text of the libname and set name. So you can parse that variable if one or more of the sets needs to have something done differently. The name will be in upper case regardless of how it appears on the Set statement.
Some minor coding points: Length (var) will return the same value as length(strip(var)) unless you have leading blanks. Leading blanks for character values is often a sign of reading data in a sub-optimal manner.
If needed you can rename multiple variables at one time either in the data set option (extra space to show the alignment of parentheses)
set meps.YRM2011 (rename=(dupersid=t_dupersid duid=t_duid condidx=t_condidx ) ) ;
Or as a statement
rename dupersid=t_dupersid duid=t_duid condidx=t_condidx ;
Before embarking on a lot of work, when you say : "I am doing this beacuse i want to combine datasets to one, so the variables should have same length." the first question to answer is "Why are the variables created with different lengths to begin with?"
If your data sets that you want to combine come from Proc Import or an import wizard step that calls the procedure then that is the culprit.
While Proc Import can be useful a major drawback is each set processed is treated as a unique item. That can be fine for a single set but as I guess you have found out the results can be problematic when combining data. You can also have variables change from character to numeric and changing lengths will not help that at all. You can only "stack" character data with character and numeric with numeric.
Your errors with Rename occur because the data set option uses the () around the rename=(<variable name pairs>) to indicate that it is indeed a data set option. If you need to use a rename otherwise it is just: Rename thisvar=thatvar;.
If you can provide example data for TWO datasets with the variables you need to work with as data step code (about 5 records each to cover the basic values) and what you expect the result to be this should be possible.
I have a strong suspicion that there is no need for any of the renames unless the source variable actually has different names in the sets.
The basic approach would be something like:
data want; length dupersid $10 duid $7 condidx $13; set meps.set1 meps.set2 othrlib.someset indsname=setname ; <manipulation code goes here> run;
The INDSNAME option creates a variable, in this case Setname, that will have the name of the current data set contributing the record. The variable with have the text of the libname and set name. So you can parse that variable if one or more of the sets needs to have something done differently. The name will be in upper case regardless of how it appears on the Set statement.
Some minor coding points: Length (var) will return the same value as length(strip(var)) unless you have leading blanks. Leading blanks for character values is often a sign of reading data in a sub-optimal manner.
If needed you can rename multiple variables at one time either in the data set option (extra space to show the alignment of parentheses)
set meps.YRM2011 (rename=(dupersid=t_dupersid duid=t_duid condidx=t_condidx ) ) ;
Or as a statement
rename dupersid=t_dupersid duid=t_duid condidx=t_condidx ;
I'd perhaps go one step further than @ballardw and ask why do you need to standardize the lengths of many of your variables in your input dataseets in the first place? You can still stack SAS datasets successfully without standardizing variable lengths and impose standard lengths in the stacked version using LENGTH or ATTRIB statements placed before your SET statement:
data stacked;
attrib dupersid length = $10;
set dataset1 - dataset10;
run;
As long as truncation has not happened in the input datasets I wouldn't be worried about variable lengths in these.
Why do you have different lengths in the first place?
It's better to fix a flawed import process than cleaning up after it.
/*Using PROC SQL*/
data a;
set sashelp.class;
run;
data b;
length sex $ 40;
set sashelp.class;
run;
data c;
length name $ 200;
set sashelp.class;
run;
proc sql;
create table want as
select * from a
union all corr
select * from b
union all corr
select * from c
;
quit;
That SQL trick is great, but watch out for datasets that have formats attached to the character variables (which is a another "bug" in PROC IMPORT).
For example if you attach a format to the SEX in the first data step:
data a;
set sashelp.class;
format sex $1.;
run;
You get this non-sense as a result: A character variable with a length of 40 but a display width of only 1.
Variables in Creation Order # Variable Type Len Format 1 Name Char 200 2 Sex Char 40 $1. 3 Age Num 8 4 Height Num 8 5 Weight Num 8
So add a step to remove those unneeded (and dangerous) formats.
proc datasets nolist lib=work;
modify want;
format _character_;
run;
quit;
Result
Variables in Creation Order # Variable Type Len 1 Name Char 200 2 Sex Char 40 3 Age Num 8 4 Height Num 8 5 Weight Num 8
Wow, thank you all so much. The different lengths in the datasets were not a result of the import process, they were collected in different years and had different formats in the source files. I read it in the codebook. I will work on your recommendations now and get back to you.
Thank you so much
One thing you can do: create a stump (all columns, no observations) dataset with the latest (current) attributes, and use this as the first dataset in a SET statement when you combine the datasets.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.