Solved: Re: changing the length of many variables and for many datasets in a s...

Banke · Posted 09-22-2022 07:13 PM

Hello everyone, I have this code that works in changing the length of a variable. I am doing this beacuse i want to combine datasets to one, so the variables should have same length.

i am going to stack 9 files (yr1, yr2, ...yr9) and i want them all to have the same character length for 3 types of variables (dupersid, duig, condidx). I can only do for each variable at a time. Is there a way I can circumvent this so that all the length of the 3 variables will adjusted in one data step? Thank you

/***********************************************************************
- change the 8-character dupersid to 10-character dupersid 
by adding the panel number to DUPERSIDs of panel 22 year 2017 
and panels 1-21 in the PLVE file 
- there is no year variable in the PLVE file 
- implement this change by using the LENGTH and STRIP functions as folows
*************************************************************************/

  data vsfile ;
    length dupersid $10;
    set new.h36u20 (rename=(dupersid=t_dupersid));
		if length(strip(t_dupersid))=8 then 
          dupersid=cats(put(panel,z2.), t_dupersid);
  	    else dupersid = t_dupersid;   
  drop t_dupersid;
run;

/***this is what i try to do but does not work**/
 data XX;
    length dupersid $10 duid $7 condidx $13;
    set meps.YRM2011 (rename=(dupersid=t_dupersid));
		if length(strip(t_dupersid))=8 then 
          dupersid=cats(put(panel,z2.), t_dupersid);
  	    else dupersid = t_dupersid;   
  drop t_dupersid;

  (rename=(duid=t_duid));
		if length(strip(t_duid))=8 then 
          duid=cats(put(panel,z2.), t_duid);
  	    else duid = t_duid;   
  drop t_duid;

  (rename=(condidx=t_condidx));
		if length(strip(t_condidx))=8 then 
         condidx=cats(put(panel,z2.), t_condidx);
  	    else condidx = t_condidx;   
  drop t_condidx;
run;

Log


127  data XX;
128      length dupersid $10 duid $7 condidx $13;
129      set meps.YRM2011 (rename=(dupersid=t_dupersid));
130          if length(strip(t_dupersid))=8 then
131            dupersid=cats(put(panel,z2.), t_dupersid);
132          else dupersid = t_dupersid;
133    drop t_dupersid;
134
135    (rename=(duid=t_duid));
       -
       180
ERROR 180-322: Statement is not valid or it is used out of proper order.

136          if length(strip(t_duid))=8 then
137            duid=cats(put(panel,z2.), t_duid);
138          else duid = t_duid;
139    drop t_duid;
140
141    (rename=(condidx=t_condidx));
       -
       180
ERROR 180-322: Statement is not valid or it is used out of proper order.

142          if length(strip(t_condidx))=8 then
143            condidx=cats(put(panel,z2.), t_condidx);
144          else condidx = t_condidx;
145    drop t_condidx;
146  run;

ballardw · Posted 09-22-2022 09:20 PM

Before embarking on a lot of work, when you say : "I am doing this beacuse i want to combine datasets to one, so the variables should have same length." the first question to answer is "Why are the variables created with different lengths to begin with?"

If your data sets that you want to combine come from Proc Import or an import wizard step that calls the procedure then that is the culprit.

While Proc Import can be useful a major drawback is each set processed is treated as a unique item. That can be fine for a single set but as I guess you have found out the results can be problematic when combining data. You can also have variables change from character to numeric and changing lengths will not help that at all. You can only "stack" character data with character and numeric with numeric.

Your errors with Rename occur because the data set option uses the () around the rename=(<variable name pairs>) to indicate that it is indeed a data set option. If you need to use a rename otherwise it is just: Rename thisvar=thatvar;.

If you can provide example data for TWO datasets with the variables you need to work with as data step code (about 5 records each to cover the basic values) and what you expect the result to be this should be possible.

I have a strong suspicion that there is no need for any of the renames unless the source variable actually has different names in the sets.

The basic approach would be something like:

 data want;
    length dupersid $10 duid $7 condidx $13;
    set meps.set1
        meps.set2
        othrlib.someset
        indsname=setname
    ;
    <manipulation code goes here>
run;

The INDSNAME option creates a variable, in this case Setname, that will have the name of the current data set contributing the record. The variable with have the text of the libname and set name. So you can parse that variable if one or more of the sets needs to have something done differently. The name will be in upper case regardless of how it appears on the Set statement.

Some minor coding points: Length (var) will return the same value as length(strip(var)) unless you have leading blanks. Leading blanks for character values is often a sign of reading data in a sub-optimal manner.

If needed you can rename multiple variables at one time either in the data set option (extra space to show the alignment of parentheses)

set meps.YRM2011 (rename=(dupersid=t_dupersid duid=t_duid
                          condidx=t_condidx 
                         )
                 )
;

Or as a statement

rename  dupersid=t_dupersid duid=t_duid
        condidx=t_condidx 
;

View solution in original post

ballardw · Posted 09-22-2022 09:20 PM

Before embarking on a lot of work, when you say : "I am doing this beacuse i want to combine datasets to one, so the variables should have same length." the first question to answer is "Why are the variables created with different lengths to begin with?"

If your data sets that you want to combine come from Proc Import or an import wizard step that calls the procedure then that is the culprit.

While Proc Import can be useful a major drawback is each set processed is treated as a unique item. That can be fine for a single set but as I guess you have found out the results can be problematic when combining data. You can also have variables change from character to numeric and changing lengths will not help that at all. You can only "stack" character data with character and numeric with numeric.

Your errors with Rename occur because the data set option uses the () around the rename=(<variable name pairs>) to indicate that it is indeed a data set option. If you need to use a rename otherwise it is just: Rename thisvar=thatvar;.

If you can provide example data for TWO datasets with the variables you need to work with as data step code (about 5 records each to cover the basic values) and what you expect the result to be this should be possible.

I have a strong suspicion that there is no need for any of the renames unless the source variable actually has different names in the sets.

The basic approach would be something like:

 data want;
    length dupersid $10 duid $7 condidx $13;
    set meps.set1
        meps.set2
        othrlib.someset
        indsname=setname
    ;
    <manipulation code goes here>
run;

The INDSNAME option creates a variable, in this case Setname, that will have the name of the current data set contributing the record. The variable with have the text of the libname and set name. So you can parse that variable if one or more of the sets needs to have something done differently. The name will be in upper case regardless of how it appears on the Set statement.

Some minor coding points: Length (var) will return the same value as length(strip(var)) unless you have leading blanks. Leading blanks for character values is often a sign of reading data in a sub-optimal manner.

If needed you can rename multiple variables at one time either in the data set option (extra space to show the alignment of parentheses)

set meps.YRM2011 (rename=(dupersid=t_dupersid duid=t_duid
                          condidx=t_condidx 
                         )
                 )
;

Or as a statement

rename  dupersid=t_dupersid duid=t_duid
        condidx=t_condidx 
;

SASKiwi · Posted 09-22-2022 11:03 PM

I'd perhaps go one step further than @ballardw and ask why do you need to standardize the lengths of many of your variables in your input dataseets in the first place? You can still stack SAS datasets successfully without standardizing variable lengths and impose standard lengths in the stacked version using LENGTH or ATTRIB statements placed before your SET statement:

data stacked;
  attrib dupersid length = $10;
  set dataset1 - dataset10;
run;

As long as truncation has not happened in the input datasets I wouldn't be worried about variable lengths in these.

Kurt_Bremser · Posted 09-22-2022 11:39 PM

Why do you have different lengths in the first place?

It's better to fix a flawed import process than cleaning up after it.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 09-23-2022 07:50 AM

/*Using PROC SQL*/

data a;
 set sashelp.class;
run;
data b;
 length sex $ 40;
 set sashelp.class;
run;
data c;
 length name $ 200;
 set sashelp.class;
run;


proc sql;
create table want as
select * from a
union all corr
select * from b
union all corr
select * from c
;
quit;

Tom · Posted 09-23-2022 10:12 AM

That SQL trick is great, but watch out for datasets that have formats attached to the character variables (which is a another "bug" in PROC IMPORT).

For example if you attach a format to the SEX in the first data step:

data a;
 set sashelp.class;
 format sex $1.;
run;

You get this non-sense as a result: A character variable with a length of 40 but a display width of only 1.

      Variables in Creation Order

#    Variable    Type    Len    Format

1    Name        Char    200
2    Sex         Char     40    $1.
3    Age         Num       8
4    Height      Num       8
5    Weight      Num       8

So add a step to remove those unneeded (and dangerous) formats.

proc datasets nolist lib=work;
  modify want;
    format _character_;
  run;
quit;

Result

 Variables in Creation Order

#    Variable    Type    Len

1    Name        Char    200
2    Sex         Char     40
3    Age         Num       8
4    Height      Num       8
5    Weight      Num       8

Banke · Posted 09-23-2022 10:18 AM

Wow, thank you all so much. The different lengths in the datasets were not a result of the import process, they were collected in different years and had different formats in the source files. I read it in the codebook. I will work on your recommendations now and get back to you.

Thank you so much

Kurt_Bremser · Posted 09-23-2022 12:09 PM

One thing you can do: create a stump (all columns, no observations) dataset with the latest (current) attributes, and use this as the first dataset in a SET statement when you combine the datasets.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 09-24-2022 05:18 AM

Sure. I forgot this . Just add one more PROC DATASETS as Tom did .

changing the length of many variables and for many datasets in a single data step

Re: changing the length of many variables and for many datasets in a single data step

Re: changing the length of many variables and for many datasets in a single data step

Re: changing the length of many variables and for many datasets in a single data step

Re: changing the length of many variables and for many datasets in a single data step

Re: changing the length of many variables and for many datasets in a single data step

Re: changing the length of many variables and for many datasets in a single data step

Re: changing the length of many variables and for many datasets in a single data step

Re: changing the length of many variables and for many datasets in a single data step

Re: changing the length of many variables and for many datasets in a single data step

Catch up on SAS Innovate 2026