Help using Base SAS procedures

I'm trying to convert SAS code into SQL.

Reply
New Contributor
Posts: 2

I'm trying to convert SAS code into SQL.

The SAS was written years ago by a previous developer.

Right now I am getting a file 77 MB in size from my SQL statement, when before the file was only 1.68 MB in size from SAS. 

What I need is an explanation as to what ends up in CLS_CDUP_ENRL.xls

CDUP, CLASS_TBL, and ENRLMNTS are large data extracts from the enterprise system.

After every KEEP statement there is an OUTPUT statement.

Example

KEEP OWNER STRM SESSION_CODE CLASS_SECTION SUBJECT CATALOG_NBR CLASS_NBR EMPLID DUP_CODE;

RUN;

PROC EXPORT DATA=CDUP_NO_ENRLMNT

            OUTFILE= "C:\Processes\C_DUP\for_1058\CDUP_NO_ENRLMNT.xls"

            DBMS=EXCEL2000 REPLACE;

The KEEP statement causes a DATA step to write only the variables that you specify to one or more SAS data sets.

My problem is I don't see a specific KEEP statement before CLS_CDUP_ENRL is exported.  I don't know what is retained in CLS_CDUP_ENRL.  Maybe I'm getting confused where the previous developer who made this SAS program used CLS_CDUP_ENRL in both a DATA statement and a SET statement.

Here is the SAS code.

DATA CLASS_NOT_FOUND;

MERGE CLASS_TBL (IN=IS_A) CDUP (IN=IS_B);

  BY CLS_KEY;

  IF IS_B AND NOT IS_A;

  KEEP OWNER STRM SESSION_CODE CLASS_SECTION SUBJECT CATALOG_NBR EMPLID DUP_CODE;

RUN;

PROC EXPORT DATA=CLASS_NOT_FOUND

            OUTFILE= "C:\Processes\C_DUP\for_1058\CLASS_NOT_FOUND_CDUP_ER1.xls"

            DBMS=EXCEL2000 REPLACE;

RUN;

DATA CLS_CDUP_VLD;

MERGE CLASS_TBL (IN=IS_A) CDUP (IN=IS_B);

  BY CLS_KEY;

  IF IS_B AND IS_A;

  STDNT_KEY = EMPLID || STRM || CLASS_NBR;

  STDNT_KEY2 = EMPLID || CRSE_ID;

RUN;

PROC SORT DATA=CLS_CDUP_VLD;

BY STDNT_KEY;

RUN;

DATA CLS_CDUP_ENRL;

MERGE CLS_CDUP_VLD (IN=IS_A) ENRLMNTS (IN=IS_B);

  BY STDNT_KEY;

  IF IS_B AND IS_A;

RUN;

DATA CDUP_NO_ENRLMNT;

MERGE CLS_CDUP_VLD (IN=IS_A) ENRLMNTS (IN=IS_B);

  BY STDNT_KEY;

  IF IS_A AND NOT IS_B;

  KEEP OWNER STRM SESSION_CODE CLASS_SECTION SUBJECT CATALOG_NBR CLASS_NBR EMPLID DUP_CODE;

RUN;

PROC EXPORT DATA=CDUP_NO_ENRLMNT

            OUTFILE= "C:\Processes\C_DUP\for_1058\CDUP_NO_ENRLMNT.xls"

            DBMS=EXCEL2000 REPLACE;

RUN;

DATA DUPS_CDUP_ENRL;

SET CLS_CDUP_ENRL;

BY STDNT_KEY;

IF (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1);

   KEEP OWNER EMPLID STRM SESSION_CODE SUBJECT CATALOG_NBR CLASS_SECTION DUP_CODE;

RUN;

PROC EXPORT DATA=DUPS_CDUP_ENRL

            OUTFILE= "C:\Processes\C_DUP\for_1058\DUPLICATE_CDUP_ENTRIES.xls"

            DBMS=EXCEL2000 REPLACE;

RUN;

PROC SORT DATA=CLS_CDUP_ENRL;

BY STDNT_KEY2;

RUN;

DATA UNMATCH_CDUP_ENRL;

SET CLS_CDUP_ENRL;

BY STDNT_KEY2;

IF (FIRST.STDNT_KEY2 = 1 & LAST.STDNT_KEY2 = 1);

   KEEP OWNER EMPLID STRM SESSION_CODE SUBJECT CATALOG_NBR CLASS_SECTION DUP_CODE STATUS;

RUN;

PROC EXPORT DATA=UNMATCH_CDUP_ENRL

            OUTFILE= "C:\Processes\C_DUP\for_1058\UNMATCHED_CDUP_ENTRIES.xls"

            DBMS=EXCEL2000 REPLACE;

RUN;

PROC EXPORT DATA=CLS_CDUP_ENRL

            OUTFILE= "C:\Processes\C_DUP\for_1058\CLS_CDUP_ENRL.xls"

            DBMS=EXCEL2000 REPLACE;

RUN;

Super User
Super User
Posts: 6,502

Re: I'm trying to convert SAS code into SQL.

Code looks pretty simple.  If you are seeing a large increase in size then check the input data sets.

CLASS_TBL 

CDUP

ENRLMNTS

Super User
Posts: 17,959

Re: I'm trying to convert SAS code into SQL.

What are the # of observations in the old and new output? If the size has significantly changed the file sizes will change according. I'm also assuming you're creating XLSX files now rather than XLS.

You also haven't posted your new SQL code so a comparison can't be made - check the length of the variables in your SQL code would be my suggestion.

Super User
Posts: 5,260

Re: I'm trying to convert SAS code into SQL.

Why are you converting to SQL?

Check the SAS log, after each step the number of observations and variables are printed.

Data never sleeps
Regular Contributor
Posts: 184

Re: I'm trying to convert SAS code into SQL.

The lengths of the variables (especially character variables) are also an issue. So run PROC CONTENTS for each data set created.

Super User
Posts: 6,972

Re: I'm trying to convert SAS code into SQL.

It may be that one or more variables in the input datasets has had its attributes changed (eg because of adaption to UTF) and consumes more space.

We had similar things happen when the generic attribute datasets (name/value tables) had their "value" fields changed from char(100) to char(3600) by the DB designers.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 5 replies
  • 343 views
  • 0 likes
  • 6 in conversation