<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Revision to SAS Publication: A Simple Macro to Minimize Dataset Size in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Revision-to-SAS-Publication-A-Simple-Macro-to-Minimize-Dataset/m-p/707774#M34273</link>
    <description>&lt;P&gt;Thank you for the macro.&amp;nbsp; It's very useful when working with data sets created by REDCap code where the default character-variable length is $500.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another modification that might be needed is to increase the length of the variable VARS in the data step OrigOrder2.&amp;nbsp; The variable must be long enough to hold all variable names in the original data set.&amp;nbsp; It could be set to 32767, the maximum length of a character variable or it could be set to 33*Number of Variables (32 for the maximum length of a variable name plus 1 for the spaces between the names.)&lt;/P&gt;</description>
    <pubDate>Tue, 22 Dec 2020 21:39:38 GMT</pubDate>
    <dc:creator>jackiedale</dc:creator>
    <dc:date>2020-12-22T21:39:38Z</dc:date>
    <item>
      <title>Revision to SAS Publication: A Simple Macro to Minimize Dataset Size</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Revision-to-SAS-Publication-A-Simple-Macro-to-Minimize-Dataset/m-p/581381#M28536</link>
      <description>&lt;P&gt;Recently I had a project where I had to concatenate values in a column by a unique identifier.&amp;nbsp; I had no idea how many values or the size of the values so I had to pre-build the table with extra padding (like 20000 characters).&amp;nbsp; Well this bloated the file to over 3 GB for a relatively small dataset &amp;lt; 50 variables and &amp;lt; 30,000 records.&amp;nbsp; I didn't have the time to write the data management code in SAS so I googled the topic and found this article:&amp;nbsp;&amp;nbsp;&lt;A href="http://support.sas.com/resources/papers/proceedings13/206-2013.pdf" target="_blank" rel="noopener"&gt;http://support.sas.com/resources/papers/proceedings13/206-2013.pdf&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The paper was missing a transpose step which would create the maxx_t variable.&amp;nbsp; I also noticed that the final dataset was not replaced nor was there any clean up done.&amp;nbsp; Also, the columns were ordered alphanumerically by name not in the original order.&amp;nbsp; I corrected all of these and decided to share the code with you all.&amp;nbsp; Enjoy and use at your own risk.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%MACRO	ChgLen (inp=);
	*STEP 0. Identify the original order of the variables and create the proc sql necessary to rebuild the dataset;
	PROC CONTENTS DATA = &amp;amp;inp.	NOPRINT
		OUT	= origorder (KEEP = memname name type length label varnum);
	RUN;
	proc sort data = origorder; by varnum; run;
	data origorder2 ; format samplenumber $25.; length vars $3000.;
	   do until (last.memname);
	      set origorder ;
	        by memname notsorted;
			vars=catx(',',vars,name);
	   end;
	   vars = "proc sql; create table "||strip(memname)||" as select distinct "||strip(vars)||" from newone; quit;";
	   CALL SYMPUT('rebuild', vars);
	   keep vars ;
	run;
	*STEP 1. SELECT ONLY CHARACTER VARIABLES FROM THE DATASET;
	PROC CONTENTS DATA = &amp;amp;inp.	NOPRINT
		OUT	= CharVar (WHERE =(type=2)
		KEEP = memname name type length label);
	RUN;
	*STEP 2. CREATE SAS CODES BY USING LENGTH AND MAX FUNCTIONS;
	DATA	CharVar2;
		LENGTH	name1	name2	name3	name4	name5	$100; SET CharVar;
		name1 = trim(name)||'1=length('||trim(name)||')'; name2 = 'max('||trim(name)||'1)	as '||trim(name); name3 = trim(name)||'x = '||trim(name);
		name4 = trim(name)||' = '||trim(name)||'x';
		name5 = trim(name)||'	$'||strip(put(LENGTH, best.));
	RUN;
	*STEP 3. CREATE MACRO VARIABLES THAT WILL BE USED TO GENERATE DIFFERENT SAS STATEMENTS;
	PROC SQL	NOPRINT;
		SELECT	trim(name1) INTO: newvar	SEPARATED BY ';	' FROM CharVar2;
		SELECT trim(name2) INTO: maxvar	SEPARATED BY ',	' FROM CharVar2;
		SELECT trim(name3) INTO: tname	SEPARATED BY ';	' FROM CharVar2;
		SELECT trim(name4) INTO: tnamex	SEPARATED BY ';	' FROM CharVar2;
		SELECT trim(name) INTO: CVar	SEPARATED BY '	'
		FROM CharVar2;
		SELECT trim(name)||'x' INTO: CVarx	SEPARATED BY '	' FROM CharVar2;
		SELECT trim(name5)	INTO: CVarLen	SEPARATED BY '	'
		FROM CharVar2;
	QUIT;
	*STEP 4. FIND THE MAXIMUM LENGTH VALUE FOR EACH VARIABLE;
	DATA VarLen ; SET &amp;amp;inp.; &amp;amp;newvar;
	RUN;
	*STEP 4.1. FIND THE MAXIMUM LENGTH VALUE FOR EACH VARIABLE;
	PROC SQL NOPRINT  ; 
		CREATE TABLE maxx	AS
		SELECT &amp;amp;maxvar FROM VarLen	;
	QUIT;
	*STEP 4.2. Include the missing Transpose step;
	proc transpose data=work.maxx
		out=work.maxx_t;
	run;
	*STEP 5. GET MAXIMUM LENGTH VALUES;
	DATA maxx_t2;
		LENGTH	name6	$100; SET maxx_t ;
		IF col1&amp;lt;1 THEN col1=1;
		name6 = TRIM(_name_)||'	$'||STRIP(PUT(col1, best.));
	RUN;
	*STEP 5.1. Then create a macro variable to store the new length of each character variable.;
	PROC SQL	NOPRINT ;
		SELECT	strip(name6)	INTO: newlen	SEPARATED BY '	' FROM maxx_t2 ;
	QUIT;
	*STEP 6. CHANGE VARIABLE NAMES TO TEMPORARY NAMES;
	DATA	temp (drop = &amp;amp;CVar. ) ; LENGTH	&amp;amp;CVarLen	;
		SET	&amp;amp;inp.; &amp;amp;tname.;
	RUN;
	*STEP 7. CHANGE TEMPORARY NAMES BACK TO ORIGINAL VARIABLE NAMES WITH NEW LENGTHS;
	DATA		newONE	(drop= &amp;amp;CVarx.) ; LENGTH &amp;amp;newlen.	;
		SET	temp ;
		&amp;amp;tnamex.;
	RUN;
	*STEP 8. Rebuild the original dataset in the same order as the original (from step 0 above);
	&amp;amp;rebuild ; 


	proc datasets library=work memtype=data nolist;
		delete origorder origorder2 newONE temp maxx_t2 maxx_t maxx VarLen CharVar2 ;
	run;
	quit;

%MEND ChgLen;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Aug 2019 12:30:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Revision-to-SAS-Publication-A-Simple-Macro-to-Minimize-Dataset/m-p/581381#M28536</guid>
      <dc:creator>dwaynejarman</dc:creator>
      <dc:date>2019-08-15T12:30:09Z</dc:date>
    </item>
    <item>
      <title>Re: Revision to SAS Publication: A Simple Macro to Minimize Dataset Size</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Revision-to-SAS-Publication-A-Simple-Macro-to-Minimize-Dataset/m-p/707774#M34273</link>
      <description>&lt;P&gt;Thank you for the macro.&amp;nbsp; It's very useful when working with data sets created by REDCap code where the default character-variable length is $500.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another modification that might be needed is to increase the length of the variable VARS in the data step OrigOrder2.&amp;nbsp; The variable must be long enough to hold all variable names in the original data set.&amp;nbsp; It could be set to 32767, the maximum length of a character variable or it could be set to 33*Number of Variables (32 for the maximum length of a variable name plus 1 for the spaces between the names.)&lt;/P&gt;</description>
      <pubDate>Tue, 22 Dec 2020 21:39:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Revision-to-SAS-Publication-A-Simple-Macro-to-Minimize-Dataset/m-p/707774#M34273</guid>
      <dc:creator>jackiedale</dc:creator>
      <dc:date>2020-12-22T21:39:38Z</dc:date>
    </item>
  </channel>
</rss>

