BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mmohotsi
Obsidian | Level 7

I am trying to extract data from a data source given conditions from a staging table. The staging table has more than 1 record and I therefore have to select the appropriate data source between data sources.

 

filename EXECU1
TEMP;
 
%macro Raw_Data;
data _null_;
 file EXECU1;
  set CUSTOMER_FRAME33; 
IF REPORT_NAME = 'ZUIKERBOSCH_DR' THEN DO;
 
  put "proc sql; " / "create table " Data1 " as   " / "  select SAMPLE.SAMPLE_NUMBER, SAMPLE.PARENT_ALIQUOT, SAMPLE.SAMPLING_POINT, SAMPLE.ORIGINAL_SAMPLE,
SAMPLE.SAMPLED_DATE, RESULT.NAME, RESULT.FORMATTED_ENTRY,RESULT.UNITS,
RESULT.NUMERIC_ENTRY, RESULT.STATUS,RESULT.REPORTABLE, RESULT_NUMBER";
   put " from datasource1.sample, VGSQL1.datasource1" / " where sample.sample_number=result.sample_number and" / " sample.sampled_date between ";
   put "'"  start_date +(-1) "'d and" ;
   put "'"  end_date    "'d";
   put "and sample.sampling_point in "  SAMPLING_POINTS" and " / "  RESULT.name in "  DETERMINANDS" and " ;
   put "result.status = 'A' and " / " result.reportable = 'T' and " / " parent_aliquot ne 0;" ;
 
;END;ELSE
IF REPORT_NAME = 'ZUIKERBOSCH_RD' THEN DO;
 
  put "proc sql; " / "create table " Data1 " as   " / "  select SAMPLE.SAMPLE_NUMBER, SAMPLE.PARENT_ALIQUOT, SAMPLE.SAMPLING_POINT, SAMPLE.ORIGINAL_SAMPLE,
SAMPLE.SAMPLED_DATE, RESULT.NAME, RESULT.FORMATTED_ENTRY,RESULT.UNITS,
RESULT.NUMERIC_ENTRY, RESULT.STATUS,RESULT.REPORTABLE, RESULT_NUMBER";
   put " from datasource2.sample, datasource2.result" / " where sample.sample_number=result.sample_number and" / " sample.sampled_date between ";
   put "'"  start_date +(-1) "'d and" ;
   put "'"  end_date    "'d";
   put "and sample.sampling_point in "  SAMPLING_POINTS" and " / "  RESULT.name in "  DETERMINANDS" and " ;
   put "result.status = 'A' and " / " result.reportable = 'T' and " / " parent_aliquot ne 0;" ;
 
;END; 
 
%include EXECU1;
%mend Raw_Data;
%Raw_Data; 
 
 
I get two errors as follows:

ERROR: Cannot open %INCLUDE file EXECU1.
 

;*';*";*/;quit;run;
____
180

ERROR 180-322: Statement is not valid or it is used out of proper order.

 

Please help

 
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

The DATA _NULL_ step reads all observations from the controlling dataset and will write code for all into the file. When the file is included, the generated steps will run one by one.

Where is the issue?

View solution in original post

10 REPLIES 10
Patrick
Opal | Level 21

Ideally use the running man icon here in the forum for sharing code because this maintains your source formatting.

I find it during development easiest to first PRINT generated code (file print) and then copy/paste the this code into a code window for execution. This then helps me to easier spot syntax errors in the generated code.

I find it also worth to put in the bit of extra effort for creation of a somewhat "clean" code structure with line breaks and indention for both the code generation script as well as for the generated code. You get the payback when it comes to debugging and maintenance.

options ls=max ps=max;

libname src 'c:\temp';
data CUSTOMER_FRAME33;
  set src.CUSTOMER_FRAME33;
run;

options ls=max ps=max;
filename EXECU1 TEMP;

data _null_;
/*  file EXECU1;*/
  file print;
  set CUSTOMER_FRAME33;

  if REPORT_NAME in ('ZUIKERBOSCH_DR','ZUIKERBOSCH_RD') then
    do;
      put
        "proc sql; " / 
        "  create table " Data1 "as" / 
        "  select" /
        "    t1.sample_number," /
        "    t1.parent_aliquot," /
        "    t1.sampling_point," /
        "    t1.original_sample," /
        "    t1.sampled_date," /
        "    t2.name," /
        "    t2.formatted_entry," /
        "    t2.units," /
        "    t2.numeric_entry," /
        "    t2.status," /
        "    t2.reportable," /
        "    result_number" /
        ;

      IF REPORT_NAME = 'ZUIKERBOSCH_DR' THEN
        put "  from datasource1.sample t1, VGSQL1.datasource1 t2";
      ELSE IF REPORT_NAME = 'ZUIKERBOSCH_RD' THEN
        put "  from datasource2.sample t1, datasource2.result t2";

      put 
        "  where t1.sample_number=t2.sample_number" / 
        "   and t1.sampled_date between '" start_date date9. "'d and '" end_date date9. "'d" /
        "   and t1.sampling_point in " SAMPLING_POINTS /
        "   and t2.name in " DETERMINANDS /
        "   and t2.status = 'A' " /
        "   and t2.reportable = 'T' " /
        "   and parent_aliquot ne 0;" /
        " quit;"
        ;
    end;
run;

/*%include EXECU1 /source2;*/

Above generates the following SQL with the sample data you've shared. You now can unit test & debug this sample code and potentially then amend the code generation bit based on your findings.

proc sql;                                                                                                                                                                                                                                                       
  create table DATA_RAW as                                                                                                                                                                                                                                      
  select                                                                                                                                                                                                                                                        
    t1.sample_number,                                                                                                                                                                                                                                           
    t1.parent_aliquot,                                                                                                                                                                                                                                          
    t1.sampling_point,                                                                                                                                                                                                                                          
    t1.original_sample,                                                                                                                                                                                                                                         
    t1.sampled_date,                                                                                                                                                                                                                                            
    t2.name,                                                                                                                                                                                                                                                    
    t2.formatted_entry,                                                                                                                                                                                                                                         
    t2.units,                                                                                                                                                                                                                                                   
    t2.numeric_entry,                                                                                                                                                                                                                                           
    t2.status,                                                                                                                                                                                                                                                  
    t2.reportable,                                                                                                                                                                                                                                              
    result_number                                                                                                                                                                                                                                               
  from datasource2.sample t1, datasource2.result t2                                                                                                                                                                                                             
  where t1.sample_number=t2.sample_number                                                                                                                                                                                                                       
   and t1.sampled_date between '01APR2023'd and '26APR2023'd                                                                                                                                                                                                    
   and t1.sampling_point in                                                                                                                                                                                                                                     
   and t2.name in                                                                                                                                                                                                                                               
   and t2.status = 'A'                                                                                                                                                                                                                                          
   and t2.reportable = 'T'                                                                                                                                                                                                                                      
   and parent_aliquot ne 0;                                                                                                                                                                                                                                     
 quit;                                                                                                                                                                                                                                                          
proc sql;                                                                                                                                                                                                                                                       
  create table DATA_RAW as                                                                                                                                                                                                                                      
  select                                                                                                                                                                                                                                                        
    t1.sample_number,                                                                                                                                                                                                                                           
    t1.parent_aliquot,                                                                                                                                                                                                                                          
    t1.sampling_point,                                                                                                                                                                                                                                          
    t1.original_sample,                                                                                                                                                                                                                                         
    t1.sampled_date,                                                                                                                                                                                                                                            
    t2.name,                                                                                                                                                                                                                                                    
    t2.formatted_entry,                                                                                                                                                                                                                                         
    t2.units,                                                                                                                                                                                                                                                   
    t2.numeric_entry,                                                                                                                                                                                                                                           
    t2.status,                                                                                                                                                                                                                                                  
    t2.reportable,                                                                                                                                                                                                                                              
    result_number                                                                                                                                                                                                                                               
  from datasource1.sample t1, VGSQL1.datasource1 t2                                                                                                                                                                                                             
  where t1.sample_number=t2.sample_number                                                                                                                                                                                                                       
   and t1.sampled_date between '01JAN2024'd and '19JAN2024'd                                                                                                                                                                                                    
   and t1.sampling_point in ("M-BG1","M-B8","M-B7","M-B6","M-B4","M-B2")                                                                                                                                                                                        
   and t2.name in ("Turb","Colour")                                                                                                                                                                                                                             
   and t2.status = 'A'                                                                                                                                                                                                                                          
   and t2.reportable = 'T'                                                                                                                                                                                                                                      
   and parent_aliquot ne 0;                                                                                                                                                                                                                                     
 quit;              

Ideally also define the source table in below two code sections (it wasn't in your code so I can't know):

Patrick_0-1708596533339.png 

Patrick_1-1708596566725.png

 

 

 

 

mmohotsi
Obsidian | Level 7

Result_number is from t2 whilst parent_aliquot is from t1

 

I previously used the code below to extract data when the source table was to have only one record. 

 

filename EXECU1
TEMP;
 
data _null_;
 file EXECU1;
  set CUSTOMER_FRAME33; 
 
  put "proc sql; " / "create table " Data1 " as   " / "  select SAMPLE.SAMPLE_NUMBER, SAMPLE.PARENT_ALIQUOT, SAMPLE.SAMPLING_POINT, SAMPLE.ORIGINAL_SAMPLE,
SAMPLE.SAMPLED_DATE, RESULT.NAME, RESULT.FORMATTED_ENTRY,RESULT.UNITS,
RESULT.NUMERIC_ENTRY, RESULT.STATUS,RESULT.REPORTABLE, RESULT_NUMBER";
  put " from datasource1.sample, datasource1.result" / " where sample.sample_number=result.sample_number and" / " sample.sampled_date between ";
  put "'"  start_date +(-1) "'d and" ;
  put "'"  end_date    "'d";
  put "and sample.sampling_point in "  SAMPLING_POINTS" and " / "  RESULT.name in "  DETERMINANDS" and " ;
  put "result.status = 'A' and " / " result.reportable = 'T' and " / " parent_aliquot ne 0;" / " quit;";
 
 
run;
%include EXECU1; 
 
It now shows that with the staging table that I included in the topic, any of the two records can be found in the staging table. The aim will therefore be to ask SAS to read the staging table and extract data based on the record that in on the staging table. The unique field that identifies the record is REPORT_NAME.
I therefore thought that a macro can be used to make such a request for SAS to execute.

 

Patrick
Opal | Level 21

I've used what you call "staging" table as the driver for code generation. The sample you've shared got two rows. 

For each row in this "staging" table a separate Proc SQL will get created which I shared as well. Is this not what you want? If not then can you please explain what should happen? Which row from the "staging" table should get used for code generation?

mmohotsi
Obsidian | Level 7

Extracting the data for each record (request) on the staging table is doable and then interrogating the data sets individually. In my case, the staging table is not constant.  I have two customers who complete the information on the staging table at any given time.  So, I thought of requesting SAS to automatically read the staging table and if it finds only one record (request) then SAS should extract the data and generate a report. If it finds two records (requests) then SAS should generate two distinct data sets for each row (request) and then generate a report for each record (request).

Patrick
Opal | Level 21

@mmohotsi wrote:

Extracting the data for each record (request) on the staging table is doable and then interrogating the data sets individually. In my case, the staging table is not constant.  I have two customers who complete the information on the staging table at any given time.  So, I thought of requesting SAS to automatically read the staging table and if it finds only one record (request) then SAS should extract the data and generate a report. If it finds two records (requests) then SAS should generate two distinct data sets for each row (request) and then generate a report for each record (request).


You didn't share sample code for report generation but the bit that creates table(s) based on what's in source variable Data1 is what the code I shared is doing - and which is basically the logic of the code you shared with some improvements.

This code is data driven (automatic) so not really sure what's missing for you. Is the code you shared something you developed or was that just a copy/paste and you actually don't really understand what and how it generates code?

Kurt_Bremser
Super User

You would create the file EXECU1 once the macro executes, but you include the file already during the definition of the macro, so this cannot work.

Your macro has no parameter, and nowhere in the code do you make use of a macro variable to make code dynamic. All controlling information is in the dataset, which means you do not need the macro at all. Remove the %MACRO and %MEND statements, and the macro call. Start a new SAS session (the failed run caused an unstable state), and run the reduced code.

mmohotsi
Obsidian | Level 7
Hi Kurt_Bremser
The reduced code works for a single record from the staging table. In this case the table may have any of the two record that have the unique variable REPORT_NAME on the staging table. So I need SAS to either extract data when there is one of the records on the staging table or both(the records).
Kurt_Bremser
Super User

The DATA _NULL_ step reads all observations from the controlling dataset and will write code for all into the file. When the file is included, the generated steps will run one by one.

Where is the issue?

Tom
Super User Tom
Super User

@mmohotsi wrote:
Hi Kurt_Bremser
The reduced code works for a single record from the staging table. In this case the table may have any of the two record that have the unique variable REPORT_NAME on the staging table. So I need SAS to either extract data when there is one of the records on the staging table or both(the records).

Not clear at all what you mean.  You have code being generated from DATA.  The code is using the value of a variable named DATA1 to create the dataset name that is being created. 

"create table " Data1 " as   "

So if the source data has multiple observations for the same value of DATA1 then you will end up writing two steps both create the same dataset.

 

So what do you want to do in that situation? 

Show the code you want generated.

 

Do you just want the selection criteria to include multiple conditions that should be OR'd?

Do you want to generate one step that pulls data using difference selection criteria?

Do you want to generate multiple files and then combine them?  Look into using PROC APPEND.  Or since you seem to be generating SQL code instead of SAS code perhaps only the first step for a value of DATA1 would use CREATE TABLE and the other steps generated should use INSERT INTO instead.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 812 views
  • 0 likes
  • 4 in conversation