DATA Step, Macro, Functions and more

how to combine large datasets efficiently?

Reply
Occasional Contributor
Posts: 10

how to combine large datasets efficiently?

Hi there

I have quarterly data on firms in each state in a country. The data consists of individual files for each state and each quarter from 1980 to 2005. There are 6 variables of interest in each file - STATENAME, FIRMID, YR_QTR, EMPL1, EMPL2, EMPL3 and several 100 variables that I am not interested in. I want to sum the employment in each quarter (EMPL1 EMPL2 EMPL3) and find the first date of non-zero employment for each firm. I am using PROC APPEND for now to create a combined dataset for each state and then identifying the first non-zero employment for each firm.

Is there a more efficient way to program this without creating a large dataset? Any help with the code would be much appreciated.c

Thanks

Dana

PROC Star
Posts: 1,760

Re: how to combine large datasets efficiently?

Yes! Summarise each table, and *then* append the summaries where employment is non-zero.

Something like

%macro loop;

  %do year=1980 %to 2005;

    %do qtr=1 %to 4;

      proc summary data=DATA_&year.Q&qtr. nway;

        id YR_QTR;

        class STATENAME FIRMID;

        var EMPL1 EMPL2 EMPL3 ;

        output out=SUM_&year.Q&qtr. sum=;

      run;

    %end;

  %end;

  data WANT;

  set

    %do year=1980 %to 2005;

      %do qtr=1 %to 4;

       &SUM_&year.Q&qtr.

      %end;

    %end;

  ;

  where EMPL1 & EMPL2 & EMPL3;

  by YR_QTR STATENAME FIRMID;

  ... more logic

  run;

%mend;

%loop;

Ask a Question
Discussion stats
  • 1 reply
  • 188 views
  • 0 likes
  • 2 in conversation