Hello,
I'm trying to take this sql script string and break it up into column rows to start. We are scripting out the sql from oracle but we want to create scripts that are formatted a specific way.
CREATE TABLE "TEMP" ("TEMPID" NUMBER(7,0), "TESTID" NUMBER(6,0), "TYPE" VARCHAR2(5), "STARTDT" DATE, "SOURCE" VARCHAR2(5), "TITLE" VARCHAR2(6), CONSTRAINT "TEMP_PK" PRIMARY KEY ("TEMPID") ENABLE, CONSTRAINT "TEMP_FK" FOREIGN KEY ("KEYID") REFERENCES "KEY" ("ID") ENABLE) ;
Output:
TABNm COLNm COLType
TEMP TEMPID NUMBER(7,0)
TEMP TESTID NUMBER(6,0)
TEMP TYPE VARCHAR2(5)
TEMP STARTDT DATE
TEMP SOURCE VARCHAR2(5)
TEMP TITLE VARCHAR2(6)
This is what I have so far. I try to isolate just the variables and then take a count of double quotes and divide it by 2 to get the number of variables, but I'm not sure how to move each variable into a separate row.
data x ; set raw ; if f1=1 then do ; tabname=scan(line,2,'"') ; ParenPos=find(line,'(') ; x=substr(line,ParenPos) ; EndPos=find(x,'CONSTRAINT') ; if EndPos = 0 then EndPos=find(x,';') ; y=substr(x,1,EndPos-1) ; VarCnt=count(y,'"')/2 ; end ; run ;
Hello Jerry
The following code splits the create statement and extracts Column names and types. The Table name is extracted as the third word using SCAN, and the Column definitions are extracted using a regular expression, that looks for a word in quotes followed by one space followed by an unquoted word optionally followed by a block containing only digits, parantheses or comma. The expression also finds the keys, and they are omitted because the words PRIMARY and FOREIGN are not valid Column types.
data have;
line = 'CREATE TABLE "TEMP" ("TEMPID" NUMBER(7,0), "TESTID" NUMBER(6,0), "TYPE" VARCHAR2(5), "STARTDT" DATE, "SOURCE" VARCHAR2(5), "TITLE" VARCHAR2(6), CONSTRAINT "TEMP_PK" PRIMARY KEY ("TEMPID") ENABLE, CONSTRAINT "TEMP_FK" FOREIGN KEY ("KEYID") REFERENCES "KEY" ("ID") ENABLE) ;';
run;
data want (keep=TABNm COLNm COLType); set have;
length w $200 TABNm COLNm COLType $60;
TABNm = dequote(scan(line,3,' '));
xpr = prxparse('/"\w+"\s{1}\w+[\d,\(\)]*/');
start = 1;
stop = length(trim(line));
call prxnext(xpr, start, stop, line, pos, len);
do while (pos > 0);
w = substr(line, pos, len-1);
COLNm = dequote(scan(w,1,' '));
COLType = dequote(scan(w,2,' '));
if substr(COLType,1,4) not in ('PRIM','FORE') then output;
call prxnext(xpr, start, stop, line, pos, len);
end;
run;
The result is:
TEMP TEMPID NUMBER(7,0)
TEMP TESTID NUMBER(6,0)
TEMP TYPE VARCHAR2(5)
TEMP STARTDT DATE
TEMP SOURCE VARCHAR2(5)
TEMP TITLE VARCHAR2(6)
Not sure why you would be wanting to parse a string? Why not just extract the metadata from the database? That would be more robust.
E.g.
select * from dictionary.columns where ...;
I would love to do that, but I have to follow what my supervisor is requesting. I'm not sure his thinking on this. He may think that since it is an oracle DB that we should use Oracle export process to include everything. He may not think SAS can return everything we may need.
Thank you
It doesn't matter if you use the database or SAS, any structured language has metadata about its tables. In the DB these are held under dictionary tables, in SAS they are under sashelp.vtable and vcolumn (although you can refer to dictionary also in SQL through SAS). So in either circumstance you should be able to pull metadata without resorting to code parsing. If you have to use the text, and that is a real pain, then a simpler suggestion is to run it in SAS or DB then extract metadata, that takes all the parsing away.
Hello Jerry
The following code splits the create statement and extracts Column names and types. The Table name is extracted as the third word using SCAN, and the Column definitions are extracted using a regular expression, that looks for a word in quotes followed by one space followed by an unquoted word optionally followed by a block containing only digits, parantheses or comma. The expression also finds the keys, and they are omitted because the words PRIMARY and FOREIGN are not valid Column types.
data have;
line = 'CREATE TABLE "TEMP" ("TEMPID" NUMBER(7,0), "TESTID" NUMBER(6,0), "TYPE" VARCHAR2(5), "STARTDT" DATE, "SOURCE" VARCHAR2(5), "TITLE" VARCHAR2(6), CONSTRAINT "TEMP_PK" PRIMARY KEY ("TEMPID") ENABLE, CONSTRAINT "TEMP_FK" FOREIGN KEY ("KEYID") REFERENCES "KEY" ("ID") ENABLE) ;';
run;
data want (keep=TABNm COLNm COLType); set have;
length w $200 TABNm COLNm COLType $60;
TABNm = dequote(scan(line,3,' '));
xpr = prxparse('/"\w+"\s{1}\w+[\d,\(\)]*/');
start = 1;
stop = length(trim(line));
call prxnext(xpr, start, stop, line, pos, len);
do while (pos > 0);
w = substr(line, pos, len-1);
COLNm = dequote(scan(w,1,' '));
COLType = dequote(scan(w,2,' '));
if substr(COLType,1,4) not in ('PRIM','FORE') then output;
call prxnext(xpr, start, stop, line, pos, len);
end;
run;
The result is:
TEMP TEMPID NUMBER(7,0)
TEMP TESTID NUMBER(6,0)
TEMP TYPE VARCHAR2(5)
TEMP STARTDT DATE
TEMP SOURCE VARCHAR2(5)
TEMP TITLE VARCHAR2(6)
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.