Hello,
Rookie question: How do I get SAS to recognize a "." as part of table name instead of the start of a file extension?
Background: I am a SQL Developer/Administrator. My company recently gained access to a large data warehouse that exists in a SAS Computing Environment (version 9.4). As a test before I start looking at sensitive data, I am trying to copy a reference table from their directory to my own, for later transfer off their server.
Problem: My code looks like this:
LIBNAME MyDir '/target/directory/';
LIBNAME RefDir '/reference/directory/';
PROC SQL;
CREATE TABLE MyDir.MyTable AS
SELECT * FROM RefDir.ReferenceTable.fyXX.txt WHERE SpecificField IS NOT NULL;
QUIT;
The XX in "fyXX" corresponds to a 2 digit fiscal year.
I get errors 22-322 and 200-322 at the period before fyXX. I am certain this is the name of the table I want (cannot just copy/paste due to company policy). It is my hope that there is some simple solution to this problem as I will need to copy several of these tables for reference into my SQL database.
First thing is your file is NOT using fixed length records. So remove the RECFM=F.
Second no need to specify the record length, unless your records are longer than 32K bytes.
To read a delimited file make sure to use LIST mode in your INPUT statement.
You shouldn't have gotten that exact error message from your posted code. It should show the actual file name. Perhaps you added quotes around newfile instead of using it as a fileref? INFILE newFile is different than INFILE "newFile".
Here is a simple skeleton of a data step to read a delimited file.
Use INFILE to point to the file. You can use a fileref defined by an earlier FILENAME statement if you want or just include the quoted physical path in the INFILE statement.
I find that it works best to just define the variables BEFORE you reference them in statements like INPUT, FORMAT, INFORMAT or assignment statements. I typically do not include any informats in the INPUT statement, but if you do add them then make sure to include the : modifier in front to make sure you are still using LIST input mode and not FORMATTED input mode. If you define you variables in the dataset in the same order as they appear in the input file then your INPUT statement can be as simple as below with a single position based variable list.
filename newFile '/reference/directory/ReferenceTable.fyXX.txt';
data TestFile;
infile newFile dsd dlm='|' truncover firstobs=2 ;
length field1 $4 field2 $30 ... fieldlast $40 ;
input field1 -- fieldlast ;
run;
Is that a TEXT file or a sas table ?
if it was a sas table, try
SELECT * FROM 'RefDir.ReferenceTable.fyXX.txt'n WHERE SpecificField IS NOT NULL;
I just tried that and it did not work. I got the error message:
ERROR: The value RefDir.ReferenceTable.fyXX.txt is not a valid SAS name.
Also, my understanding is that all of the files in RefDir are sas tables. I think the references just end in .txt because they are so static (updated once annually).
Opps. Try:
SELECT * FROM RefDir.'ReferenceTable.fyXX.txt'n WHERE SpecificField IS NOT NULL;
Are you sure it is not a TEXT file ?
Also try add option before code.
option validmemname=extend;
proc sql;
select ............
I would be very surprised if anyone named a SAS dataset with '.txt' in the member name.
Take a quick look at the top of the file. If it is readable then it is just a text file and not a SAS dataset.
data _null_;
infile '/reference/directory/ReferenceTable.fyXX.txt'
recfm=f lrecl=100 obs=5
;
input;
list;
run;
The reason I believe these to be tables is that the Data Dictionary I was given listed the file path and file names I am using. This table is just a list of codes, definitions of those codes, and legacy names for them.
The data sets I need to pull for my job consist of some PII along with these codes. That set is in .sas7bdat files. They don't give the definitions though.
I ran your code and the following is a sample of the log (each line is doubled):
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 CHAR N|.2011|A001|CHOLERA DUE TO VIBRIO CHOLERAE 01, BIOVAR ELTOR|CHOLERA:V.CHOLERA 01,BIO ELTOR|0011||||
ZONE 4703333743337444445424542542544544244444544233224445452445457444445435244444542332444244545733337777
NUMR ECA2011C1001C38FC521045504F069229F038FC5215001C029F61205C4F2C38FC521A6E38FC521001C29F05C4F2C0011CCCC
While this formatting is difficult to look it, it does seem to contain the information I need. Any recommendations on how I could use this to generate a structured table?
That is just a text file. It is NOT a SAS dataset so you cannot reference it like you would a dataset.
It looks like it is using pipe character, | , as the delimiter between the fields. So you could easily read that using a data step.Or you could let PROC IMPORT take a stab at guessing how to read it. It does look strange that the first line seems to just have 'N|' instead of full set of headers. So that would probably confuse PROC IMPORT.
The report looks like that because the data step read it as FIXED records instead of normal variable length lines. So it read the the end of line characters as part of the data and so the LIST statement added the extra lines to show the actual hexcodes of those non-printable characters. So '4E'x is the hexcode for 'N' and '7C'x is the hexcode for '|' and '0A'x is the hexcode for the linefeed character that is normally used on Unix systems to mark the end of a line in text files.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 1 CHAR N|.2011|A001|CHOLERA DUE TO VIBRIO CHOLERAE 01, BIOVAR ELTOR|CHOLERA:V.CHOLERA 01,BIO ELTOR|0011|||| ZONE 4703333743337444445424542542544544244444544233224445452445457444445435244444542332444244545733337777 NUMR ECA2011C1001C38FC521045504F069229F038FC5215001C029F61205C4F2C38FC521A6E38FC521001C29F05C4F2C0011CCCC
So is it possible to create a table from this txt file using filename and infile, provided i know the data field types/lengths?
It looks like it is using pipe character, | , as the delimiter between the fields. So you could easily read that using a data step.
Ok, so I have modified my code to the following:
LIBNAME MyDir '/target/directory/';
filename newFile '/reference/directory/ReferenceTable.fyXX.txt';
data TestFile;
infile newFile dlm='|' missover
recfm=f lrecl=300 obs = 5;
input
@1 Field1 $char4.
@2 Field2 $char30.
etc...;
following my filename statement, I get
ERROR: Invalid logical name.
ERROR: Error in the FILENAME statement.
At the end of my code, I get the error:
ERROR: Physical file does not exist, /target/directory/newFile.
Am I close?
First thing is your file is NOT using fixed length records. So remove the RECFM=F.
Second no need to specify the record length, unless your records are longer than 32K bytes.
To read a delimited file make sure to use LIST mode in your INPUT statement.
You shouldn't have gotten that exact error message from your posted code. It should show the actual file name. Perhaps you added quotes around newfile instead of using it as a fileref? INFILE newFile is different than INFILE "newFile".
Here is a simple skeleton of a data step to read a delimited file.
Use INFILE to point to the file. You can use a fileref defined by an earlier FILENAME statement if you want or just include the quoted physical path in the INFILE statement.
I find that it works best to just define the variables BEFORE you reference them in statements like INPUT, FORMAT, INFORMAT or assignment statements. I typically do not include any informats in the INPUT statement, but if you do add them then make sure to include the : modifier in front to make sure you are still using LIST input mode and not FORMATTED input mode. If you define you variables in the dataset in the same order as they appear in the input file then your INPUT statement can be as simple as below with a single position based variable list.
filename newFile '/reference/directory/ReferenceTable.fyXX.txt';
data TestFile;
infile newFile dsd dlm='|' truncover firstobs=2 ;
length field1 $4 field2 $30 ... fieldlast $40 ;
input field1 -- fieldlast ;
run;
So close!
The only error I'm getting at the moment is
ERROR: Physical file does not exist, /reference/directory/ReferenceTable.fyXX.txt.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: SAS set option OBS=0 and will continue to check statements. This might cause NOTE: No observations in data set.
WARNING: The data set WORK.TESTFILE may be incomplete. When this step was stopped there were 0 observations and 11 variables.
Hi @gillymon
In your original post you wrote: "The XX in "fyXX" corresponds to a 2 digit fiscal year."
You get the error because you can't use the placeholder XX in the filename statement. You must change it to a 2-digit year.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.