- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a quandry. Recently I have been told that using the Import Wizard to bring in external tab delimited text files was creating a bunch of temporary files on my C Drive, which I am not allowed to access, and as such is causing a plethora of other issues with my computer. It was suggested that I begin using Data step to import the external files. When I have run the suggested code, the file imports, but the display is completely in accurate. Which makes it difficult to go back and troubleshoot any output issues. I have tried to copy and paste the Import Wizard code into a program, and it doesn't run at all (because it is still looking for a temporary file).
Sample Data
Import Wizard Code - This is the code that doesn't work outside of using the Import Wizard
Code generated by a SAS task
Generated on Thursday, July 06, 2017 at 4:37:59 PM
By task: Import Data Wizard
Source file: G:\Financial Services\Financial Reporting\Monthly
Production Reports\Wealth_Core
Reports\Retail_Accounts_for_Dashboards 062017.txt
Server: Local File System
Output data: WORK.Retail_Accounts_for_Dashboards_0
Server: Analytics
Note: In preparation for running the following code, the Import
Data wizard has used internal routines to transfer the source data
file from the local file system to Analytics. There is no SAS code
available to represent this action.
-------------------------------------------------------------------- */
DATA WORK.Retail_Accounts_for_Dashboards_0;
LENGTH
'Account Number'n $ 16
Product $ 3
'Sub Product'n $ 3
'User Field39'n $ 2
'Account Number Dim Key'n $ 11 ;
FORMAT
'Account Number'n $CHAR16.
Product $CHAR3.
'Sub Product'n $CHAR3.
'User Field39'n $CHAR2.
'Account Number Dim Key'n $CHAR11. ;
INFORMAT
'Account Number'n $CHAR16.
Product $CHAR3.
'Sub Product'n $CHAR3.
'User Field39'n $CHAR2.
'Account Number Dim Key'n $CHAR11. ;
INFILE 'E:\saswork\_TD13452_WIL-ENTSASPRD06_\#LN00033'
LRECL=39
ENCODING="WLATIN1"
TERMSTR=CRLF
DLM='7F'x
MISSOVER
DSD ;
INPUT
'Account Number'n : $CHAR16.
Product : $CHAR3.
'Sub Product'n : $CHAR3.
'User Field39'n : $CHAR2.
'Account Number Dim Key'n : $CHAR11. ;
RUN;
Suggested Code - the one with the 3-Card monty for column values:
DATA Ret_Accts_for_Dashboards;
LENGTH
'Account Number'n $ 16
Product $ 3
'Sub Product'n $ 3
'User Field39'n $ 2
'Account Number Dim Key'n $ 11 ;
FORMAT
'Account Number'n $CHAR16.
Product $CHAR3.
'Sub Product'n $CHAR3.
'User Field39'n $CHAR2.
'Account Number Dim Key'n $CHAR11. ;
INFORMAT
'Account Number'n $CHAR16.
Product $CHAR3.
'Sub Product'n $CHAR3.
'User Field39'n $CHAR2.
'Account Number Dim Key'n $CHAR11. ;
INFILE '\\wil-netapp02\rev-cred\Financial Services\Financial Reporting\Monthly Production Reports\Wealth_Core Reports\06 2017 Retail Interchange.txt'
DLM=' '
DSD MISSOVER
FIRSTOBS=2;
INPUT
'Account Number'n : $CHAR16.
Product : $CHAR3.
'Sub Product'n : $CHAR4.
'User Field39'n : $CHAR2.
'Account Number Dim Key'n : $CHAR11.;
RUN;
And the resulting outputs with the display issues - all of the data is to come in as character, but using Proc Import chooses the 'best' data type for this, which doesn't necessarily work out great.
Any thoughts on how to correct the display issues?
Thanks
El
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You just need to update that code to reflect the data which you know best, for instance:
data ret_accts_for_dashboards; length account_number $16 Product $3 sub_product 8 user_field39 8 account_number_dim_key 8; format account_number $16 Product $3 sub_product 5 user_field39 5 account_number_dim_key 5; informat account_number $16 Product $3 sub_product 5 user_field39 5 account_number_dim_key 5; infile '\\wil-netapp02\rev-cred\Financial Services\Financial Reporting\Monthly Production Reports\Wealth_Core Reports\06 2017 Retail Interchange.txt' dlm='09'x dsd missover firstobs=2; input account_number $ Product $ sub_product user_field39 account_number_dim_key; run;
Reads sub_product and user_field in as numeric.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The EG guide wizard will upload text files to the SAS server and generate a data step to read the file. For Excel files it generates a text file on your PC and uploads that.
The code the EG writes is much more complicated than what you would write yourself. You do not need to attach formats or informats to 90% of variables. You also do not need to include $ in the INPUT statemen if you have already defined the variable as character. Just define the type and length using the LENGTH statement and SAS will know how to read and display them. The main exception is date and time fields. Also for fields with embedded $ and/or commas you should use the COMMA (or DOLLAR) informat instead of default informat (the default informat is just nn. which can also be called Fnn. Some users like to call it a BEST informat, but BEST is really the name for FORMAT and not an INFORMAT).
So if you wanted to re-write the code the EG generated you just need the DATA, INFILE, LENGTH and INPUT statement. You can change the variables from character to numeric by changing the length assigned to them. Numeric variables should use length 8 as SAS stores numbers using 64bit floating point.
If you want to change a variable from character to numeric then change the LENGTH value from something like '$12' to '8'.
If your file is really a tab delimited file then you would want to use '09'x as the delimiter on the INFILE statement and not '7F'x. That might be the delimiter that EG uses when it creates a file from an Excel spreadsheet. Your account number DIM Key field values look like numbers with commas in them. But a KEY should not be a number since no one wants to take the mean of it. If you do convert it to a number then you will need to add an INFORMAT statement or include a informat for it in the INPUT statement. If you do include the informat in the INPUT statement it is critical to prefix it with ':' or it could confuse how SAS parses the lines at the delimiters.
You also probably do NOT want the LRECL= option on the INFILE statement. Unless your lines are longer than the default (which is currently 32767).
So here is an updated data step that will from a tab delimited file and read 4th column as a number.
DATA WORK.Retail_Accounts_for_Dashboards_0;
LENGTH
'Account Number'n $ 16
Product $ 3
'Sub Product'n $ 3
'User Field39'n 8
'Account Number Dim Key'n $ 11
;
INFILE 'E:\saswork\_TD13452_WIL-ENTSASPRD06_\#LN00033'
ENCODING="WLATIN1"
TERMSTR=CRLF
DLM='09'x
truncover
DSD
;
INPUT 'Account Number'n -- 'Account Number Dim Key'n ;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can get EG to generate a slightly more "reusable" file for you by clicking the Performance button, then "Bypass the data cleansing process" on the Import Data task.
This will tell EG to not "fix" the file for you by changing the delimiter and quoting in the fields. It will instead copy/import the file as-is, which might include a few data processing warts -- but hey, that's life!