I have following set of statements in sql that i would like to run through SAS odbc connection in SQL server and create a report on data completeness of the tables present at our end
/**************************************************************************************/
SET NOCOUNT ON
DECLARE @Schema NVARCHAR(100) = 'EMR'
DECLARE @Table NVARCHAR(100) = 'eMR_Assessment'
DECLARE @SQL NVARCHAR(MAX) =''
IF OBJECT_ID ('EMR.eMR_Assessment_dchek') IS NOT NULL DROP TABLE EMR.eMR_Assessment_dchek
CREATE TABLE EMR.eMR_Assessment_dchek (TableName sysname, ColumnName sysname , ColumnPosition int ,NullCount int , NonNullCount int, TotalCount int, NotNullpercentage decimal(20,2))
SELECT @SQL += 'SELECT '''+TABLE_NAME+''' AS TableName , '''+COLUMN_NAME+''' AS ColumnName, '''+CONVERT(VARCHAR(5),ORDINAL_POSITION)+''' AS ColumnPosition,
SUM(CASE WHEN '+COLUMN_NAME+' IS NULL THEN 1 ELSE 0 END) CountNulls , COUNT(' +COLUMN_NAME+') CountnonNulls, count(*) TotalCount,
(cast(COUNT(' +COLUMN_NAME+') as decimal (20,2))/ cast(count(*) as decimal (20,2)))*100 NotNullpercentage
FROM '+QUOTENAME(TABLE_SCHEMA)+'.'+QUOTENAME(TABLE_NAME)+';'+ CHAR(10)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @Schema
AND TABLE_NAME = @table
INSERT INTO EMR.eMR_Assessment_dchek
EXEC sp_executesql @SQL
SELECT *
FROM EMR.eMR_Assessment_dchek
/***************************************************************************************/
I have incorporated above query in to sas execute statements
/********************************************************************************/
proc sql;
connect to odbc(dsn=ambo);
execute (SET ANSI_NULLS ON)by odbc;
execute (SET QUOTED_IDENTIFIER ON)by odbc;
execute (SET ANSI_PADDING ON)by odbc;
execute (
SET NOCOUNT ON
DECLARE @Schema NVARCHAR(100) = 'EMR'
DECLARE @Table NVARCHAR(100) = 'eMR_Assessment'
DECLARE @SQL NVARCHAR(MAX) =''
IF OBJECT_ID ('EMR.eMR_Assessment_dchek') IS NOT NULL DROP TABLE EMR.eMR_Assessment_dchek
)by odbc;
execute (
CREATE TABLE EMR.eMR_Assessment_dchek (TableName sysname, ColumnName sysname , ColumnPosition int ,NullCount int , NonNullCount int, TotalCount int, NotNullpercentage decimal(20,2))
)by odbc;
execute (
SELECT @SQL += 'SELECT '''+TABLE_NAME+''' AS TableName , '''+COLUMN_NAME+''' AS ColumnName, '''+CONVERT(VARCHAR(5),ORDINAL_POSITION)+''' AS ColumnPosition,
SUM(CASE WHEN '+COLUMN_NAME+' IS NULL THEN 1 ELSE 0 END) CountNulls , COUNT(' +COLUMN_NAME+') CountnonNulls, count(*) TotalCount,
(cast(COUNT(' +COLUMN_NAME+') as decimal (20,2))/ cast(count(*) as decimal (20,2)))*100 NotNullpercentage
FROM '+QUOTENAME(TABLE_SCHEMA)+'.'+QUOTENAME(TABLE_NAME)+';'+ CHAR(10)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @Schema
AND TABLE_NAME = @table
)by odbc;
execute (SET ANSI_PADDING OFF)by odbc; quit;
execute ( INSERT INTO EMR.eMR_Assessment_dchek
EXEC sp_executesql @SQL
)by odbc;
execute ( SELECT *
FROM EMR.eMR_Assessment_dchek
)by odbc;
quit;
/*********************************************************************************************/
But I am getting following error in SAS EG:
ERROR: CLI execute error: [Microsoft][ODBC SQL Server Driver][SQL Server]Must declare the scalar variable "@sql".
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
execute (SET ANSI_PADDING OFF)by odbc;
NOTE: Statement not executed due to NOEXEC option.
quit;
NOTE: The SAS System stopped processing this step because of errors.
Whats possibly going wrong!
It looks like that complex code is just trying to generate code from data.
Do you know what code it is trying to generate?
Do you know what data it is using to generate the code?
Do you have access to the data it is using to generate the code?
If you can answer those then you use SAS to generate the code instead and your process will be much easier.
Let's see if we can translate that SQL server dialect into normal code.
First it appears to be setting two constants and creating a place to store the code:
DECLARE @Schema NVARCHAR(100) = 'EMR'
DECLARE @Table NVARCHAR(100) = 'eMR_Assessment'
DECLARE @SQL NVARCHAR(MAX) =''
So in SAS you might use macro variables for that. So something like:
%let Schema= 'EMR' ;
%let Table= 'eMR_Assessment' ;
%let SQL=;
Then it seems to be querying this table:
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @Schema
AND TABLE_NAME = @table
Now lets see if we can figure out what code it is trying to generate from that data.
SELECT @SQL +=
'SELECT '''
+TABLE_NAME
+''' AS TableName , '''
+COLUMN_NAME
+''' AS ColumnName, '''
+CONVERT(VARCHAR(5),ORDINAL_POSITION)
+''' AS ColumnPosition,SUM(CASE WHEN '
+COLUMN_NAME
+' IS NULL THEN 1 ELSE 0 END) CountNulls , COUNT('
+COLUMN_NAME
+') CountnonNulls, count(*) TotalCount,(cast(COUNT('
+COLUMN_NAME
+') as decimal (20,2))/ cast(count(*) as decimal (20,2)))*100 NotNullpercentage FROM '
+QUOTENAME(TABLE_SCHEMA)
+'.'
+QUOTENAME(TABLE_NAME)
+';'
+ CHAR(10)
Looks like it is trying to generate a series of SELECT statements separated by semi-colon and linefeed.
Looks like it is expecting to find variables named:
TABLE_NAME
COLUMN_NAME
ORDINAL_POSITION
TABLE_SCHEMA
In that INFORMATION_SCHEMA.COLUMNS table.
Based on the names of the resulting columns it appears that it is trying to generate code to process every variable in a table and figure out how often the value is missing.
You could easily do that in SAS. For example here is one way.
Get the list of variables. Use it to generate some SQL code to find the counts you need. Transpose it to get it from one observation for the dataset to one observation per variable. Calculate the percentage.
proc contents data=sashelp.demographics noprint out=contents; run;
proc sort; by varnum; run;
filename code temp;
data _null_ ;
file code ;
if _n_=1 then put 'create table summary as select count(*) as nobs' ;
set contents end=eof;
put ',count(' name ') as ' name ;
if eof then put 'from ' libname +(-1) '.' memname ';' ;
run;
proc sql;
%include code / source2;
quit;
proc transpose data=summary out=summary2;
by nobs;
var _all_;
run;
data want;
if _n_=1 then set contents(keep=libname memname rename=(libname=Schema memname=TableName));
set summary2 (firstobs=2 rename=(col1=NonNullCount nobs=TotalCount ));
rename _name_=ColumnName;
ColumnPosition +1;
NotNullpercentage = 100*NonNullCount/TotalCount;
run;
proc print;
run;
Results:
Non Total Null Column Not Obs Schema TableName Count ColumnName Count Position Nullpercentage 1 SASHELP DEMOGRAPHICS 197 CONT 197 1 100.000 2 SASHELP DEMOGRAPHICS 197 ID 197 2 100.000 3 SASHELP DEMOGRAPHICS 197 ISO 197 3 100.000 4 SASHELP DEMOGRAPHICS 197 NAME 197 4 100.000 5 SASHELP DEMOGRAPHICS 197 ISONAME 197 5 100.000 6 SASHELP DEMOGRAPHICS 197 region 197 6 100.000 7 SASHELP DEMOGRAPHICS 197 pop 197 7 100.000 8 SASHELP DEMOGRAPHICS 197 popAGR 197 8 100.000 9 SASHELP DEMOGRAPHICS 197 popUrban 197 9 100.000 10 SASHELP DEMOGRAPHICS 197 totalFR 197 10 100.000 11 SASHELP DEMOGRAPHICS 197 AdolescentFPpct 123 11 62.437 12 SASHELP DEMOGRAPHICS 197 AdolescentFPyear 123 12 62.437 13 SASHELP DEMOGRAPHICS 197 AdultLiteracypct 134 13 68.020 14 SASHELP DEMOGRAPHICS 197 MaleSchoolpct 166 14 84.264 15 SASHELP DEMOGRAPHICS 197 FemaleSchoolpct 166 15 84.264 16 SASHELP DEMOGRAPHICS 197 GNI 170 16 86.294 17 SASHELP DEMOGRAPHICS 197 PopPovertypct 52 17 26.396 18 SASHELP DEMOGRAPHICS 197 PopPovertyYear 70 18 35.533
Which you could get to work for a table in your remote database by just pointing a libref at the database using a LIBNAME statement. So if you made a libref in SAS named EMR pointing to the EMR schema in your database. Then in my SAS code you just replace SASHELP.DEMOGRAPHICS with EMR.eMR_Assessment.
Each EXECUTE step is by default a separate SQL session. You declare @SQL in one session but refer to it in another where it is not declared. Either put all of your SQL in one EXECUTE statement or create a shared SQL session across all of your EXECUTE statements:
connect to odbc(dsn=ambo) connection = global;
Another way is to develop everything in SQL Developer/Toad and then in the end just "wrap" a SAS Execute block around it.
Here a modified sample from real code.
libname blah oracle .....;
proc sql ;
connect using blah as con1;
execute
(
BEGIN
EXECUTE IMMEDIATE %str(%')truncate table <schema>.<table> drop storage%str(%');
EXCEPTION
WHEN OTHERS THEN
IF SQLCODE != -942 THEN
RAISE;
END IF;
EXECUTE IMMEDIATE
%str(%')
CREATE TABLE <schema>.<table>
(
"PARTY_RK" NUMBER(15,0) NOT NULL ENABLE,
"VALID_FROM_DTTM" TIMESTAMP (6),
"VALID_TO_DTTM" TIMESTAMP (6),
"PARTY_ID" VARCHAR2(64 CHAR) NOT NULL ENABLE,
"SOURCE_SYSTEM_CD" VARCHAR2(10 CHAR) NOT NULL ENABLE,
"PARTY_TYPE_CD" VARCHAR2(32 CHAR),
"PARTY_CATEGORY_CD" VARCHAR2(32 CHAR),
"PARTY_SUBCATEGORY_CD" VARCHAR2(32 CHAR),
"IDENTICAL_PARTY_LINK_SK" NUMBER(10,0),
"INDIVIDUAL_FLG" CHAR(1 CHAR) NOT NULL ENABLE,
"PARTY_FULL_NM" VARCHAR2(200 CHAR),
"NATIONAL_ID" VARCHAR2(32 CHAR),
"NATIONAL_ID_TYPE_CD" VARCHAR2(10 CHAR),
"UI_DEF_FILE_NM" VARCHAR2(100 CHAR),
"CREATE_USER_ID" VARCHAR2(60 CHAR),
"CREATE_DTTM" TIMESTAMP (6) NOT NULL ENABLE,
"UPDATE_USER_ID" VARCHAR2(60 CHAR),
"VERSION_NO" NUMBER(10,0) NOT NULL ENABLE,
"DELETE_FLG" CHAR(1 CHAR) NOT NULL ENABLE,
"PARTY_CONFIG_SEQ_NO" NUMBER(10.0) NOT NULL ENABLE,
"LOAD_TYPE" CHAR(1 CHAR)
)
%str(%');
EXECUTE IMMEDIATE
%str(%')
create unique index <schema>.<table>._idx1
on <schema>.<table>
(party_rk, version_no)
%str(%');
EXECUTE IMMEDIATE
%str(%')
create unique index <schema>.<table>._idx2
on <schema>.<table>
(PARTY_ID, SOURCE_SYSTEM_CD)
%str(%');
END;
)
by con1;
quit;
Above approach should work for what you've posted except for the last SELECT statement where you want the data/report back into the SAS environment. Here the code could look similar to below:
proc sql;
connect using blah as con1;
select * from connection to con1
(
SELECT *
FROM EMR.eMR_Assessment_dchek
);
quit;
Thanks but connection=global isn't working for me though now
I am getting error at quotename function.
ERROR: CLI execute error: [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near '<schema_name>'.
Sorry but could you help me understand what this statement does:
SELECT @SQL += 'SELECT '''+TABLE_NAME+''' AS TableName , '''+COLUMN_NAME+''' AS ColumnName, '''+CONVERT(VARCHAR(5),ORDINAL_POSITION)+''' AS ColumnPosition,
SUM(CASE WHEN '+COLUMN_NAME+' IS NULL THEN 1 ELSE 0 END) CountNulls , COUNT(' +COLUMN_NAME+') CountnonNulls, count(*) TotalCount,
(cast(COUNT(' +COLUMN_NAME+') as decimal (20,2))/ cast(count(*) as decimal (20,2)))*100 NotNullpercentage
FROM '+QUOTENAME(TABLE_SCHEMA)+'.'+QUOTENAME(TABLE_NAME)+';'+ CHAR(10)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE Table_Schema = 'EMR'
AND Table_name = 'eMR_Main'
I really want this bit of query to run on SAS as in SQL server.
Thanks,
S
It looks like that complex code is just trying to generate code from data.
Do you know what code it is trying to generate?
Do you know what data it is using to generate the code?
Do you have access to the data it is using to generate the code?
If you can answer those then you use SAS to generate the code instead and your process will be much easier.
Let's see if we can translate that SQL server dialect into normal code.
First it appears to be setting two constants and creating a place to store the code:
DECLARE @Schema NVARCHAR(100) = 'EMR'
DECLARE @Table NVARCHAR(100) = 'eMR_Assessment'
DECLARE @SQL NVARCHAR(MAX) =''
So in SAS you might use macro variables for that. So something like:
%let Schema= 'EMR' ;
%let Table= 'eMR_Assessment' ;
%let SQL=;
Then it seems to be querying this table:
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @Schema
AND TABLE_NAME = @table
Now lets see if we can figure out what code it is trying to generate from that data.
SELECT @SQL +=
'SELECT '''
+TABLE_NAME
+''' AS TableName , '''
+COLUMN_NAME
+''' AS ColumnName, '''
+CONVERT(VARCHAR(5),ORDINAL_POSITION)
+''' AS ColumnPosition,SUM(CASE WHEN '
+COLUMN_NAME
+' IS NULL THEN 1 ELSE 0 END) CountNulls , COUNT('
+COLUMN_NAME
+') CountnonNulls, count(*) TotalCount,(cast(COUNT('
+COLUMN_NAME
+') as decimal (20,2))/ cast(count(*) as decimal (20,2)))*100 NotNullpercentage FROM '
+QUOTENAME(TABLE_SCHEMA)
+'.'
+QUOTENAME(TABLE_NAME)
+';'
+ CHAR(10)
Looks like it is trying to generate a series of SELECT statements separated by semi-colon and linefeed.
Looks like it is expecting to find variables named:
TABLE_NAME
COLUMN_NAME
ORDINAL_POSITION
TABLE_SCHEMA
In that INFORMATION_SCHEMA.COLUMNS table.
Based on the names of the resulting columns it appears that it is trying to generate code to process every variable in a table and figure out how often the value is missing.
You could easily do that in SAS. For example here is one way.
Get the list of variables. Use it to generate some SQL code to find the counts you need. Transpose it to get it from one observation for the dataset to one observation per variable. Calculate the percentage.
proc contents data=sashelp.demographics noprint out=contents; run;
proc sort; by varnum; run;
filename code temp;
data _null_ ;
file code ;
if _n_=1 then put 'create table summary as select count(*) as nobs' ;
set contents end=eof;
put ',count(' name ') as ' name ;
if eof then put 'from ' libname +(-1) '.' memname ';' ;
run;
proc sql;
%include code / source2;
quit;
proc transpose data=summary out=summary2;
by nobs;
var _all_;
run;
data want;
if _n_=1 then set contents(keep=libname memname rename=(libname=Schema memname=TableName));
set summary2 (firstobs=2 rename=(col1=NonNullCount nobs=TotalCount ));
rename _name_=ColumnName;
ColumnPosition +1;
NotNullpercentage = 100*NonNullCount/TotalCount;
run;
proc print;
run;
Results:
Non Total Null Column Not Obs Schema TableName Count ColumnName Count Position Nullpercentage 1 SASHELP DEMOGRAPHICS 197 CONT 197 1 100.000 2 SASHELP DEMOGRAPHICS 197 ID 197 2 100.000 3 SASHELP DEMOGRAPHICS 197 ISO 197 3 100.000 4 SASHELP DEMOGRAPHICS 197 NAME 197 4 100.000 5 SASHELP DEMOGRAPHICS 197 ISONAME 197 5 100.000 6 SASHELP DEMOGRAPHICS 197 region 197 6 100.000 7 SASHELP DEMOGRAPHICS 197 pop 197 7 100.000 8 SASHELP DEMOGRAPHICS 197 popAGR 197 8 100.000 9 SASHELP DEMOGRAPHICS 197 popUrban 197 9 100.000 10 SASHELP DEMOGRAPHICS 197 totalFR 197 10 100.000 11 SASHELP DEMOGRAPHICS 197 AdolescentFPpct 123 11 62.437 12 SASHELP DEMOGRAPHICS 197 AdolescentFPyear 123 12 62.437 13 SASHELP DEMOGRAPHICS 197 AdultLiteracypct 134 13 68.020 14 SASHELP DEMOGRAPHICS 197 MaleSchoolpct 166 14 84.264 15 SASHELP DEMOGRAPHICS 197 FemaleSchoolpct 166 15 84.264 16 SASHELP DEMOGRAPHICS 197 GNI 170 16 86.294 17 SASHELP DEMOGRAPHICS 197 PopPovertypct 52 17 26.396 18 SASHELP DEMOGRAPHICS 197 PopPovertyYear 70 18 35.533
Which you could get to work for a table in your remote database by just pointing a libref at the database using a LIBNAME statement. So if you made a libref in SAS named EMR pointing to the EMR schema in your database. Then in my SAS code you just replace SASHELP.DEMOGRAPHICS with EMR.eMR_Assessment.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.