Recently I had a project where I had to concatenate values in a column by a unique identifier. I had no idea how many values or the size of the values so I had to pre-build the table with extra padding (like 20000 characters). Well this bloated the file to over 3 GB for a relatively small dataset < 50 variables and < 30,000 records. I didn't have the time to write the data management code in SAS so I googled the topic and found this article: http://support.sas.com/resources/papers/proceedings13/206-2013.pdf
The paper was missing a transpose step which would create the maxx_t variable. I also noticed that the final dataset was not replaced nor was there any clean up done. Also, the columns were ordered alphanumerically by name not in the original order. I corrected all of these and decided to share the code with you all. Enjoy and use at your own risk.
%MACRO ChgLen (inp=);
*STEP 0. Identify the original order of the variables and create the proc sql necessary to rebuild the dataset;
PROC CONTENTS DATA = &inp. NOPRINT
OUT = origorder (KEEP = memname name type length label varnum);
RUN;
proc sort data = origorder; by varnum; run;
data origorder2 ; format samplenumber $25.; length vars $3000.;
do until (last.memname);
set origorder ;
by memname notsorted;
vars=catx(',',vars,name);
end;
vars = "proc sql; create table "||strip(memname)||" as select distinct "||strip(vars)||" from newone; quit;";
CALL SYMPUT('rebuild', vars);
keep vars ;
run;
*STEP 1. SELECT ONLY CHARACTER VARIABLES FROM THE DATASET;
PROC CONTENTS DATA = &inp. NOPRINT
OUT = CharVar (WHERE =(type=2)
KEEP = memname name type length label);
RUN;
*STEP 2. CREATE SAS CODES BY USING LENGTH AND MAX FUNCTIONS;
DATA CharVar2;
LENGTH name1 name2 name3 name4 name5 $100; SET CharVar;
name1 = trim(name)||'1=length('||trim(name)||')'; name2 = 'max('||trim(name)||'1) as '||trim(name); name3 = trim(name)||'x = '||trim(name);
name4 = trim(name)||' = '||trim(name)||'x';
name5 = trim(name)||' $'||strip(put(LENGTH, best.));
RUN;
*STEP 3. CREATE MACRO VARIABLES THAT WILL BE USED TO GENERATE DIFFERENT SAS STATEMENTS;
PROC SQL NOPRINT;
SELECT trim(name1) INTO: newvar SEPARATED BY '; ' FROM CharVar2;
SELECT trim(name2) INTO: maxvar SEPARATED BY ', ' FROM CharVar2;
SELECT trim(name3) INTO: tname SEPARATED BY '; ' FROM CharVar2;
SELECT trim(name4) INTO: tnamex SEPARATED BY '; ' FROM CharVar2;
SELECT trim(name) INTO: CVar SEPARATED BY ' '
FROM CharVar2;
SELECT trim(name)||'x' INTO: CVarx SEPARATED BY ' ' FROM CharVar2;
SELECT trim(name5) INTO: CVarLen SEPARATED BY ' '
FROM CharVar2;
QUIT;
*STEP 4. FIND THE MAXIMUM LENGTH VALUE FOR EACH VARIABLE;
DATA VarLen ; SET &inp.; &newvar;
RUN;
*STEP 4.1. FIND THE MAXIMUM LENGTH VALUE FOR EACH VARIABLE;
PROC SQL NOPRINT ;
CREATE TABLE maxx AS
SELECT &maxvar FROM VarLen ;
QUIT;
*STEP 4.2. Include the missing Transpose step;
proc transpose data=work.maxx
out=work.maxx_t;
run;
*STEP 5. GET MAXIMUM LENGTH VALUES;
DATA maxx_t2;
LENGTH name6 $100; SET maxx_t ;
IF col1<1 THEN col1=1;
name6 = TRIM(_name_)||' $'||STRIP(PUT(col1, best.));
RUN;
*STEP 5.1. Then create a macro variable to store the new length of each character variable.;
PROC SQL NOPRINT ;
SELECT strip(name6) INTO: newlen SEPARATED BY ' ' FROM maxx_t2 ;
QUIT;
*STEP 6. CHANGE VARIABLE NAMES TO TEMPORARY NAMES;
DATA temp (drop = &CVar. ) ; LENGTH &CVarLen ;
SET &inp.; &tname.;
RUN;
*STEP 7. CHANGE TEMPORARY NAMES BACK TO ORIGINAL VARIABLE NAMES WITH NEW LENGTHS;
DATA newONE (drop= &CVarx.) ; LENGTH &newlen. ;
SET temp ;
&tnamex.;
RUN;
*STEP 8. Rebuild the original dataset in the same order as the original (from step 0 above);
&rebuild ;
proc datasets library=work memtype=data nolist;
delete origorder origorder2 newONE temp maxx_t2 maxx_t maxx VarLen CharVar2 ;
run;
quit;
%MEND ChgLen;
Thank you for the macro. It's very useful when working with data sets created by REDCap code where the default character-variable length is $500.
Another modification that might be needed is to increase the length of the variable VARS in the data step OrigOrder2. The variable must be long enough to hold all variable names in the original data set. It could be set to 32767, the maximum length of a character variable or it could be set to 33*Number of Variables (32 for the maximum length of a variable name plus 1 for the spaces between the names.)
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.