BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
buechler66
Barite | Level 11

Hi all.  This is a quite complicated task I'm hoping to get some help with.  If you have any suggestions or code that would help me to accomplish this task I'd really appreciated it.  If it's too big of an ask I certainly understand that too.

 

I have a physical file on the server that I need to read in with the goal of outputting two columns of data to a SAS dataset.  Each record output would contain 'Column Name' and 'Table Name' from any Proc SQL steps in a SAS program it reads in:

1.  Each table name in the 'FROM' (the real table name so ignoring aliases)

2.  Each Column name in the 'SELECT' and 'WHERE' conditions (the real column names so again ignoring aliases)

 

This is the example SAS program I've been playing with. Notice it's intentionally messy because I anticipate people writing code in a variety of ways:

/* MACROS */
	%LET DATE_START = '01JUN2016'D;
	%LET DATE_END = '31MAY2018'D;

/*	================================================
	SECTION 1:	
	================================================ */
PROC SQL;
CREATE TABLE &USERN._LIB.ORTHOTICS_BASE_PULL_V1 AS
SELECT DISTINCT 
	 CLM.CLM_HIC_NUM AS BENE_HIC_NUM
	,CLM.BENE_SK
	,CLM.CLM_THRU_DT    ,CLM.CLM_PMT_AMT
	,LINE.CLM_LINE_FROM_DT
	,LINE.CLM_LINE_THRU_DT, 
	LINE.CLM_LINE_HCPCS_CD AS HCPCS_CD , BENE.BENE_LINE_2_ADR
	,BENE.BENE_LINE_3_ADR,    	
	BENE.BENE_LINE_4_ADR
FROM OPI_AAL.CLM AS CLM
INNER JOIN OPI_AAL.BENE AS BENE 
	ON (CLM.BENE_SK = BENE.BENE_SK)
INNER JOIN OPI_AAL.CLM_LINE AS LINE 
	ON (CLM.GEO_BENE_SK = LINE.GEO_BENE_SK 
		AND CLM.CLM_DT_SGNTR_SK = LINE.CLM_DT_SGNTR_SK 
		AND CLM.CLM_TYPE_CD = LINE.CLM_TYPE_CD 
		AND CLM.CLM_NUM_SK = LINE.CLM_NUM_SK) 
WHERE CLM.CLM_TYPE_CD = 2800
	AND LINE.CLM_LINE_PRVDR_PMT_AMT > 0 AND 	CLM.CLM_FINL_ACTN_IND = 'Y'
;QUIT;

So the desired output would be a dataset containing:

TABLE NAME:		COLUMN NAME:

OPI_AAL.CLM		CLM_HIC_NUM
OPI_AAL.CLM		BENE_SK
OPI_AAL.CLM		CLM_THRU_DT
OPI_AAL.CLM		CLM_PMT_AMT
OPI_AAL.CLM		CLM_TYPE_CD
OPI_AAL.CLM		CLM_FINL_ACTN_IND
OPI_AAL.BENE 		BENE_LINE_2_ADR
OPI_AAL.BENE 		BENE_LINE_3_ADR
OPI_AAL.BENE 		BENE_LINE_4_ADR
OPI_AAL.CLM_LINE 	CLM_LINE_FROM_DT
OPI_AAL.CLM_LINE 	CLM_LINE_THRU_DT
OPI_AAL.CLM_LINE 	CLM_LINE_HCPCS_CD 
OPI_AAL.CLM_LINE 	CLM_LINE_PRVDR_PMT_AMT

So far I've been able to read in the SAS program with the following code, but I'm stuck on how to parse thru the lines of the file to create my desired output based on those requirements:

filename file_in "/sasdata/users/steve/DME/DMECode.sas";

data pgm;
infile file_in end=template_eof;
length line $ 2000;
  do while (not template_eof);
    input;
    line = _infile_;
	 output;
  end;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@buechler66 

Below code extracts the SQL from your source code and stores it in a single observation.

data pgm;
  infile datalines truncover;
  input;
  length sql_code $32767;
  retain sql_code sql_flg;

  if sql_flg ne '1' and find(compbl(_infile_),'proc sql','i')>0 then
    sql_flg='1';
  else if sql_flg='1' then
    do;
      if find(compbl(_infile_),'quit','i')>0 then
        do;
          output;
          sql_flg='0';
          call missing(sql_code);
        end;
      else
        do;
          sql_code=catx(' ',sql_code,_infile_);
        end;
    end;

  datalines4;

/* MACROS */
  %LET DATE_START = '01JUN2016'D;
  %LET DATE_END = '31MAY2018'D;

/*  ================================================
  SECTION 1:  
  ================================================ */
PROC SQL;
CREATE TABLE &USERN._LIB.ORTHOTICS_BASE_PULL_V1 AS
SELECT DISTINCT 
   CLM.CLM_HIC_NUM AS BENE_HIC_NUM
  ,CLM.BENE_SK
  ,CLM.CLM_THRU_DT    ,CLM.CLM_PMT_AMT
  ,LINE.CLM_LINE_FROM_DT
  ,LINE.CLM_LINE_THRU_DT, 
  LINE.CLM_LINE_HCPCS_CD AS HCPCS_CD , BENE.BENE_LINE_2_ADR
  ,BENE.BENE_LINE_3_ADR,      
  BENE.BENE_LINE_4_ADR
FROM OPI_AAL.CLM AS CLM
INNER JOIN OPI_AAL.BENE AS BENE 
  ON (CLM.BENE_SK = BENE.BENE_SK)
INNER JOIN OPI_AAL.CLM_LINE AS LINE 
  ON (CLM.GEO_BENE_SK = LINE.GEO_BENE_SK 
    AND CLM.CLM_DT_SGNTR_SK = LINE.CLM_DT_SGNTR_SK 
    AND CLM.CLM_TYPE_CD = LINE.CLM_TYPE_CD 
    AND CLM.CLM_NUM_SK = LINE.CLM_NUM_SK) 
WHERE CLM.CLM_TYPE_CD = 2800
  AND LINE.CLM_LINE_PRVDR_PMT_AMT > 0 AND   CLM.CLM_FINL_ACTN_IND = 'Y'
;QUIT;
;;;;
run;

You will have to extend the code logic to deal with more use cases, i.e. multiple SQL statements within a single Proc SQL/QUIT block or explicit pass-through SQL.

 

I still believe that implementing something generic will be very hard but.... even if you don't succeed you could actually learn a lot along the way just by trying.

View solution in original post

6 REPLIES 6
Patrick
Opal | Level 21

@buechler66 

It would certainly be possible to write some code for the sample program you've posted but to come up with something generic feels like a huge endeavor. That's basically about writing a SQL parser.

If you need that only for a single program then consider a fully manual approach. If the end goal is to write some sort of data lineage reporting tool then I'd spend time searching the Internet for existing tools. Eventually search with words like "SQL Tokenizer" and the like.

 

buechler66
Barite | Level 11
Yah, I wish I could see a clear path, but I know this isn't an easy
problem.

I wish I could somehow isolate everything between the 'PROC SQL' and the
'QUIT' as a single giant string that I could parse. That would be a
starting point.

I know based on SQL rules that all the column names follow the 'Select'
statement (separated by commas). And I know all table names follow the
'From' statement (separated by commas).

Ugh. I guess it's not a solvable problem in reality.

Thanks for chiming in tho. I appreciate it.
Patrick
Opal | Level 21

@buechler66 

Below code extracts the SQL from your source code and stores it in a single observation.

data pgm;
  infile datalines truncover;
  input;
  length sql_code $32767;
  retain sql_code sql_flg;

  if sql_flg ne '1' and find(compbl(_infile_),'proc sql','i')>0 then
    sql_flg='1';
  else if sql_flg='1' then
    do;
      if find(compbl(_infile_),'quit','i')>0 then
        do;
          output;
          sql_flg='0';
          call missing(sql_code);
        end;
      else
        do;
          sql_code=catx(' ',sql_code,_infile_);
        end;
    end;

  datalines4;

/* MACROS */
  %LET DATE_START = '01JUN2016'D;
  %LET DATE_END = '31MAY2018'D;

/*  ================================================
  SECTION 1:  
  ================================================ */
PROC SQL;
CREATE TABLE &USERN._LIB.ORTHOTICS_BASE_PULL_V1 AS
SELECT DISTINCT 
   CLM.CLM_HIC_NUM AS BENE_HIC_NUM
  ,CLM.BENE_SK
  ,CLM.CLM_THRU_DT    ,CLM.CLM_PMT_AMT
  ,LINE.CLM_LINE_FROM_DT
  ,LINE.CLM_LINE_THRU_DT, 
  LINE.CLM_LINE_HCPCS_CD AS HCPCS_CD , BENE.BENE_LINE_2_ADR
  ,BENE.BENE_LINE_3_ADR,      
  BENE.BENE_LINE_4_ADR
FROM OPI_AAL.CLM AS CLM
INNER JOIN OPI_AAL.BENE AS BENE 
  ON (CLM.BENE_SK = BENE.BENE_SK)
INNER JOIN OPI_AAL.CLM_LINE AS LINE 
  ON (CLM.GEO_BENE_SK = LINE.GEO_BENE_SK 
    AND CLM.CLM_DT_SGNTR_SK = LINE.CLM_DT_SGNTR_SK 
    AND CLM.CLM_TYPE_CD = LINE.CLM_TYPE_CD 
    AND CLM.CLM_NUM_SK = LINE.CLM_NUM_SK) 
WHERE CLM.CLM_TYPE_CD = 2800
  AND LINE.CLM_LINE_PRVDR_PMT_AMT > 0 AND   CLM.CLM_FINL_ACTN_IND = 'Y'
;QUIT;
;;;;
run;

You will have to extend the code logic to deal with more use cases, i.e. multiple SQL statements within a single Proc SQL/QUIT block or explicit pass-through SQL.

 

I still believe that implementing something generic will be very hard but.... even if you don't succeed you could actually learn a lot along the way just by trying.

Patrick
Opal | Level 21

@buechler66 wrote:



I know based on SQL rules that all the column names follow the 'Select'
statement 

 

@buechler66 

Below some code samples just to give you a feel of all the challenges you're going to encounter if trying to deal with any SQL (and that's so far only the SAS SQL flavor).

proc sql;
  select 
    *
  from sashelp.class
  ;
quit;

proc sql;
  select 
    (select count(*) from sashelp.class where sex=o.sex) as group_cnt,
    *
  from sashelp.class o
  ;
quit;

 

buechler66
Barite | Level 11
Yes, I agree completely after hearing ur comments on the cballenges.

Thanks for this example code. It will not go to waste. After just a brief look I've already learned some things. Thanks so much for your time giving input on this. Much appreciated!
Patrick
Opal | Level 21

Just to finish this off: Adding the FEEDBACK option to proc SQL and then parse the SQL code out of the SAS log will make it possible to deal with select *, will always give you column names with an alias or table name added to variables and also to give you access to the actual SQL executed when code is wrapped into SAS macro syntax;

28         proc sql feedback;
29           select
30             *
31           from sashelp.class
32           ;
NOTE: Statement transforms to:

        select CLASS.Name, CLASS.Sex, CLASS.Age, CLASS.Height, CLASS.Weight
          from SASHELP.CLASS;

33         quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 6244 views
  • 2 likes
  • 2 in conversation