BookmarkSubscribeRSS Feed
Emil_Sarauw
SAS Employee

Do you know the feeling when you want to append a table, and it’s an ad-hoc job which you just don’t want to spend to get done fast, but you end up spending way too much time on it, because you must match the length of the variables?

Let’s say, you have the following two tables:

 

 

data FamilienSand;
	length navn $13 type $8 jobbeskrivelse $14 interesser $35;
	infile datalines dsd;
	input navn$ type$ jobbeskrivelse$ interesser$;
	datalines;
	Oluf Sand, Menneske, Kartoffelavler, Pibe
	Gertrud Sand, Menneske, Kartoffelavler, 'Telefon, Food-processor og Julepynt'
	Emil Sand, Hund, Hund, Radioavisen
;run;

data Nisser;
	length navn $8 type $5 jobbeskrivelse $52 interesser $23;
	infile datalines dsd;
	input navn$ type$ jobbeskrivelse$ interesser$;
	datalines;
	Fritz, Nisse, Uddeler opgaver, Kontrabas
	Hansi, Nisse, The one with the biggest tænder and the grimmest tøj, The kartoffelaver woman
	Günther, Nisse, Snitternisse, Guitar og Mundharmonika
;run;

 

 

Both tables contain the same variables, but the lengths are different. Proc Append gives you an error if you don’t add the “Force” option, and even if you do then your variables in one data set will be truncated according to the base-data set.

 

 

proc append data=FamilienSand base=Nisser;run; *Error;
proc append data=FamilienSand base=Nisser force;run; *Truncates the FamilienSand – data set;

 

 

You will now have to correct the length-statement in both tables to make sure they match. You might think:” Do I really have to do all this work for this little ad-hoc job?”. All you wanted was, after all to just append the data while keeping all the observations whole.

Luckily for you there is an alternative. The macro: %Union comes for the rescue. It does exactly what you want it to do: it merges two data sets while always choosing the longest length-definition for the variables in question. It looks like this:

 

%macro union(dsn1=,     /*Name of the first data set    */
             dsn2=,     /*Name of the second data set   */
             out=       /*Name of combined data set     */);

 

So, your job can be solved as simple as:

 

%Union(dsn1=FamilienSand,dsn2=Nisser,out=Heltene);

 

It doesn’t get much simpler than this. And actually, it can also handle the situation, where one of your tables has a variable which the other one doesn’t. Let’s say you have a third table:

 

 

data KockSokkerOgSko;
	length navn type $13 jobbeskrivelse $38 interesser $11;
	infile datalines dsd;
	input navn $ type $ jobbeskrivelse$ svaghed$ interesser$;
	datalines;
	Benny Jensen, Nåsåer, Rejsende handelsmand i sko og strømper, Snaps, Gamle bøger
;run;

 

 

This one contains the extra variable: ”Svaghed”, which doesn’t occur in the two other tables. Well, that is no problem, you just use your %Union macro as usual:

 

%Union(dsn1=Heltene,dsn2=KockSokkerOgSko,out=SamletJulekalender);

 

The final result is what you can see here, and it is exactly as you want it to be:

 Screenshot.png

 

The code for the macro is shown below, together with the examples in this Juletip.

%macro union(dsn1=,     /*Name of the first data set    */
             dsn2=,     /*Name of the second data set   */
             out=       /*Name of combined data set     */);
  
   proc contents data=&dsn1 noprint
      out=out1(keep=name type length where=(type=2));
   proc contents data=&dsn2 noprint
      out=out2(keep=name type length where=(type=2));
   run;
   data _null_;
      file "C:\temp\combined.sas";
      merge out1 out2(rename=(length=length2)) end=last;
      by name;
      if _n_ = 1 then put "Data &out;";
      l = max(length,length2);
      put "   length " name " $ " l 5. ";";
      if last then do;
         put "   set &dsn1 &dsn2;";
         put "run;";
      end;
   run;
   %include "C:\temp\combined.sas";
%mend union;


data FamilienSand;
	length navn $13 type $8 jobbeskrivelse $14 interesser $35;
	infile datalines dsd;
	input navn$ type$ jobbeskrivelse$ interesser$;
	datalines;
	Oluf Sand, Menneske, Kartoffelavler, Pibe
	Gertrud Sand, Menneske, Kartoffelavler, 'Telefon, Food-processor og Julepynt'
	Emil Sand, Hund, Hund, Radioavisen
;run;

data Nisser;
	length navn $8 type $5 jobbeskrivelse $52 interesser $23;
	infile datalines dsd;
	input navn$ type$ jobbeskrivelse$ interesser$;
	datalines;
	Fritz, Nisse, Uddeler opgaver, Kontrabas
	Hansi, Nisse, The one with the biggest tænder and the grimmest tøj, The kartoffelaver woman
	Günther, Nisse, Snitternisse, Guitar og Mundharmonika
;run;

data KockSokkerOgSko;
	length navn type $13 jobbeskrivelse $38 interesser $11;
	infile datalines dsd;
	input navn $ type $ jobbeskrivelse$ Svaghed$ interesser$;
	datalines;
	Benny Jensen, Nåsåer, Rejsende handelsmand i sko og strømper, Snaps, Gamle bøger
;run;

proc contents data=FamilienSand;run;
proc contents data=Nisser;run;

data test;
set FamilienSand;output;
set nisser;output;
run;

/*proc append data=FamilienSand base=Nisser;run; *Error;*/
/*proc append data=FamilienSand base=Nisser force;run; *Truncates the FamilienSand – data set;*/

%Union(dsn1=FamilienSand,dsn2=Nisser,out=Heltene);
%Union(dsn1=Heltene,dsn2=KockSokkerOgSko,out=SamletJulekalender);
4 REPLIES 4
jmic_nyk
Obsidian | Level 7

It's hard to be a Nissemand

CKjeldsen
Fluorite | Level 6

Hi Emil

 

Great tip - thanks. A small improvement to the macro could be to use the WORK dir to save the intermediate file combined.sas instead of the hardcoded c:\temp. Thus, you would not experience problems with filesystems and access rights (and perhaps multiple users running on the same server). A slightly changed macro becomes:

 

%macro union(dsn1=,     /*Name of the first data set    */
             dsn2=,     /*Name of the second data set   */
out= /*Name of combined data set */); proc contents data=&dsn1 noprint out=out1(keep=name type length where=(type=2)); proc contents data=&dsn2 noprint out=out2(keep=name type length where=(type=2)); run; %let work_path=%sysfunc(pathname(work)); data _null_; file "&work_path\combined.sas"; merge out1 out2(rename=(length=length2)) end=last; by name; if _n_ = 1 then put "Data &out;"; l = max(length,length2); put " length " name " $ " l 5. ";"; if last then do; put " set &dsn1 &dsn2;"; put "run;"; end; run; %include "&work_path\combined.sas"; %mend union;

 

 

Det bår' dæjli'

Emil_Sarauw
SAS Employee

Hej Christian, god pointe! : )

astrid_vest
SAS Employee

Fedt Emil, jeg har allerede brugt det hos en kunde!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Discussion stats
  • 4 replies
  • 1527 views
  • 13 likes
  • 4 in conversation