<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Parsing a SAS program file to extract table names and column names in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553827#M154041</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/79805"&gt;@buechler66&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It would certainly be possible to write some code for the sample program you've posted but to come up with something generic feels like a huge endeavor. That's basically about writing a SQL parser.&lt;/P&gt;
&lt;P&gt;If you need that only for a single program then consider a fully manual approach. If the end goal is to write some sort of data lineage reporting tool then I'd spend time searching the Internet for existing tools. Eventually search with words like "SQL Tokenizer" and the like.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 25 Apr 2019 01:28:27 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2019-04-25T01:28:27Z</dc:date>
    <item>
      <title>Parsing a SAS program file to extract table names and column names</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553782#M154017</link>
      <description>&lt;P&gt;Hi all.&amp;nbsp; This is a quite complicated task I'm hoping to get some help with.&amp;nbsp; If you have any suggestions or code that would help me to accomplish this task I'd really appreciated it.&amp;nbsp; If it's too big of an ask I certainly understand that too.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a physical file on the server that I need to read in with the goal of outputting two columns of data to a SAS dataset.&amp;nbsp; Each record output would contain 'Column Name' and 'Table Name' from any Proc SQL steps in a SAS program it reads in:&lt;/P&gt;
&lt;P&gt;1.&amp;nbsp; Each table name in the 'FROM' (the real table name so ignoring aliases)&lt;/P&gt;
&lt;P&gt;2.&amp;nbsp; Each Column name in the 'SELECT' and 'WHERE' conditions (the real column names so again ignoring aliases)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is the example SAS program I've been playing with. Notice it's intentionally messy because I anticipate people writing code in a variety of ways:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* MACROS */
	%LET DATE_START = '01JUN2016'D;
	%LET DATE_END = '31MAY2018'D;

/*	================================================
	SECTION 1:	
	================================================ */
PROC SQL;
CREATE TABLE &amp;amp;USERN._LIB.ORTHOTICS_BASE_PULL_V1 AS
SELECT DISTINCT 
	 CLM.CLM_HIC_NUM AS BENE_HIC_NUM
	,CLM.BENE_SK
	,CLM.CLM_THRU_DT    ,CLM.CLM_PMT_AMT
	,LINE.CLM_LINE_FROM_DT
	,LINE.CLM_LINE_THRU_DT, 
	LINE.CLM_LINE_HCPCS_CD AS HCPCS_CD , BENE.BENE_LINE_2_ADR
	,BENE.BENE_LINE_3_ADR,    	
	BENE.BENE_LINE_4_ADR
FROM OPI_AAL.CLM AS CLM
INNER JOIN OPI_AAL.BENE AS BENE 
	ON (CLM.BENE_SK = BENE.BENE_SK)
INNER JOIN OPI_AAL.CLM_LINE AS LINE 
	ON (CLM.GEO_BENE_SK = LINE.GEO_BENE_SK 
		AND CLM.CLM_DT_SGNTR_SK = LINE.CLM_DT_SGNTR_SK 
		AND CLM.CLM_TYPE_CD = LINE.CLM_TYPE_CD 
		AND CLM.CLM_NUM_SK = LINE.CLM_NUM_SK) 
WHERE CLM.CLM_TYPE_CD = 2800
	AND LINE.CLM_LINE_PRVDR_PMT_AMT &amp;gt; 0 AND 	CLM.CLM_FINL_ACTN_IND = 'Y'
;QUIT;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;So the desired output would be a dataset containing:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;TABLE NAME:		COLUMN NAME:

OPI_AAL.CLM		CLM_HIC_NUM
OPI_AAL.CLM		BENE_SK
OPI_AAL.CLM		CLM_THRU_DT
OPI_AAL.CLM		CLM_PMT_AMT
OPI_AAL.CLM		CLM_TYPE_CD
OPI_AAL.CLM		CLM_FINL_ACTN_IND
OPI_AAL.BENE 		BENE_LINE_2_ADR
OPI_AAL.BENE 		BENE_LINE_3_ADR
OPI_AAL.BENE 		BENE_LINE_4_ADR
OPI_AAL.CLM_LINE 	CLM_LINE_FROM_DT
OPI_AAL.CLM_LINE 	CLM_LINE_THRU_DT
OPI_AAL.CLM_LINE 	CLM_LINE_HCPCS_CD 
OPI_AAL.CLM_LINE 	CLM_LINE_PRVDR_PMT_AMT
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;So far I've been able to read in the SAS program with the following code, but I'm stuck on how to parse thru the lines of the file to create my desired output based on those requirements:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;filename file_in "/sasdata/users/steve/DME/DMECode.sas";

data pgm;
infile file_in end=template_eof;
length line $ 2000;
  do while (not template_eof);
    input;
    line = _infile_;
	 output;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 24 Apr 2019 19:57:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553782#M154017</guid>
      <dc:creator>buechler66</dc:creator>
      <dc:date>2019-04-24T19:57:50Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing a SAS program file to extract table names and column names</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553827#M154041</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/79805"&gt;@buechler66&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It would certainly be possible to write some code for the sample program you've posted but to come up with something generic feels like a huge endeavor. That's basically about writing a SQL parser.&lt;/P&gt;
&lt;P&gt;If you need that only for a single program then consider a fully manual approach. If the end goal is to write some sort of data lineage reporting tool then I'd spend time searching the Internet for existing tools. Eventually search with words like "SQL Tokenizer" and the like.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Apr 2019 01:28:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553827#M154041</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-04-25T01:28:27Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing a SAS program file to extract table names and column names</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553828#M154042</link>
      <description>Yah, I wish I could see a clear path, but I know this isn't an easy&lt;BR /&gt;problem.&lt;BR /&gt;&lt;BR /&gt;I wish I could somehow isolate everything between the 'PROC SQL' and the&lt;BR /&gt;'QUIT' as a single giant string that I could parse. That would be a&lt;BR /&gt;starting point.&lt;BR /&gt;&lt;BR /&gt;I know based on SQL rules that all the column names follow the 'Select'&lt;BR /&gt;statement (separated by commas). And I know all table names follow the&lt;BR /&gt;'From' statement (separated by commas).&lt;BR /&gt;&lt;BR /&gt;Ugh. I guess it's not a solvable problem in reality.&lt;BR /&gt;&lt;BR /&gt;Thanks for chiming in tho. I appreciate it.&lt;BR /&gt;</description>
      <pubDate>Thu, 25 Apr 2019 01:30:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553828#M154042</guid>
      <dc:creator>buechler66</dc:creator>
      <dc:date>2019-04-25T01:30:17Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing a SAS program file to extract table names and column names</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553831#M154043</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/79805"&gt;@buechler66&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below code extracts the SQL from your source code and stores it in a single observation.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data pgm;
  infile datalines truncover;
  input;
  length sql_code $32767;
  retain sql_code sql_flg;

  if sql_flg ne '1' and find(compbl(_infile_),'proc sql','i')&amp;gt;0 then
    sql_flg='1';
  else if sql_flg='1' then
    do;
      if find(compbl(_infile_),'quit','i')&amp;gt;0 then
        do;
          output;
          sql_flg='0';
          call missing(sql_code);
        end;
      else
        do;
          sql_code=catx(' ',sql_code,_infile_);
        end;
    end;

  datalines4;

/* MACROS */
  %LET DATE_START = '01JUN2016'D;
  %LET DATE_END = '31MAY2018'D;

/*  ================================================
  SECTION 1:  
  ================================================ */
PROC SQL;
CREATE TABLE &amp;amp;USERN._LIB.ORTHOTICS_BASE_PULL_V1 AS
SELECT DISTINCT 
   CLM.CLM_HIC_NUM AS BENE_HIC_NUM
  ,CLM.BENE_SK
  ,CLM.CLM_THRU_DT    ,CLM.CLM_PMT_AMT
  ,LINE.CLM_LINE_FROM_DT
  ,LINE.CLM_LINE_THRU_DT, 
  LINE.CLM_LINE_HCPCS_CD AS HCPCS_CD , BENE.BENE_LINE_2_ADR
  ,BENE.BENE_LINE_3_ADR,      
  BENE.BENE_LINE_4_ADR
FROM OPI_AAL.CLM AS CLM
INNER JOIN OPI_AAL.BENE AS BENE 
  ON (CLM.BENE_SK = BENE.BENE_SK)
INNER JOIN OPI_AAL.CLM_LINE AS LINE 
  ON (CLM.GEO_BENE_SK = LINE.GEO_BENE_SK 
    AND CLM.CLM_DT_SGNTR_SK = LINE.CLM_DT_SGNTR_SK 
    AND CLM.CLM_TYPE_CD = LINE.CLM_TYPE_CD 
    AND CLM.CLM_NUM_SK = LINE.CLM_NUM_SK) 
WHERE CLM.CLM_TYPE_CD = 2800
  AND LINE.CLM_LINE_PRVDR_PMT_AMT &amp;gt; 0 AND   CLM.CLM_FINL_ACTN_IND = 'Y'
;QUIT;
;;;;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You will have to extend the code logic to deal with more use cases, i.e. multiple SQL statements within a single Proc SQL/QUIT block or explicit pass-through SQL.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I still believe that implementing something generic will be very hard but.... even if you don't succeed you could actually learn a lot along the way just by trying.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Apr 2019 01:52:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553831#M154043</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-04-25T01:52:48Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing a SAS program file to extract table names and column names</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553832#M154044</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/79805"&gt;@buechler66&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I know based on SQL rules that all the column names follow the 'Select'&lt;BR /&gt;statement&amp;nbsp;&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/79805"&gt;@buechler66&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below some code samples just to give you a feel of all the challenges you're going to encounter if trying to deal with any SQL (and that's so far only the SAS SQL flavor).&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  select 
    *
  from sashelp.class
  ;
quit;

proc sql;
  select 
    (select count(*) from sashelp.class where sex=o.sex) as group_cnt,
    *
  from sashelp.class o
  ;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Apr 2019 02:03:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553832#M154044</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-04-25T02:03:02Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing a SAS program file to extract table names and column names</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553842#M154047</link>
      <description>Yes, I agree completely after hearing ur comments on the cballenges. &lt;BR /&gt;&lt;BR /&gt;Thanks for this example code. It will not go to waste. After just a brief look I've already learned some things. Thanks so much for your time giving input on this. Much appreciated!</description>
      <pubDate>Thu, 25 Apr 2019 03:08:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553842#M154047</guid>
      <dc:creator>buechler66</dc:creator>
      <dc:date>2019-04-25T03:08:48Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing a SAS program file to extract table names and column names</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553845#M154048</link>
      <description>&lt;P&gt;Just to finish this off: Adding the FEEDBACK option to proc SQL and then parse the SQL code out of the SAS log will make it possible to deal&amp;nbsp;with select *, will always give you column names with an alias or table name added to variables and also to give you access to the actual SQL executed when code is wrapped into SAS macro syntax;&lt;/P&gt;
&lt;PRE&gt;28         proc sql feedback;
29           select
30             *
31           from sashelp.class
32           ;
NOTE: Statement transforms to:

        select CLASS.Name, CLASS.Sex, CLASS.Age, CLASS.Height, CLASS.Weight
          from SASHELP.CLASS;

33         quit;&lt;/PRE&gt;</description>
      <pubDate>Thu, 25 Apr 2019 03:42:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parsing-a-SAS-program-file-to-extract-table-names-and-column/m-p/553845#M154048</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-04-25T03:42:00Z</dc:date>
    </item>
  </channel>
</rss>

