Not 100% on this, I need to list the column names and the types for each column, having a problem with joins so I want to see how SAS imported the datatypes.
Thanks in advance
Use proc contents on a dataset:
proc contents data=sashelp.class;
run;
and look at the ouput.
Or extract the information from dictionary.columns (SQL):
proc sql;
select * from dictionary.columns
where libname = 'SASHELP' and memname = 'CLASS'
;
quit;
or sashelp.vcolumn, which is a view on dictionary.columns for data steps:
data columns;
set sashelp.vcolumn;
where libname = 'SASHELP' and memname = 'CLASS';
run;
Use proc contents on a dataset:
proc contents data=sashelp.class;
run;
and look at the ouput.
Or extract the information from dictionary.columns (SQL):
proc sql;
select * from dictionary.columns
where libname = 'SASHELP' and memname = 'CLASS'
;
quit;
or sashelp.vcolumn, which is a view on dictionary.columns for data steps:
data columns;
set sashelp.vcolumn;
where libname = 'SASHELP' and memname = 'CLASS';
run;
Since you specified a one-level name in the out= option, SAS will put the resulting dataset BIDDATA into library WORK.
Therefore memname = 'BIDDATA' and libname = 'WORK'.
Ok, I am good here, now I have found my issue, so in the PROC IMPORT Statement I need to specify for a field that it needs to be a CHAR instead of a NUM. I am pretty sure I can do this in the Import Statement, just not sure how to write it and also do I need to do this for every field or just the ones that I need to change?
The problem with proc import is that it is a guessing procedure. A further problem comes from the fact that you use xlsx as file format (not good).
With a text-based file format, proc import would create a data step, run that, and write it to the log, from where you can take it for modification.
With Excel files, SAS uses a different engine (no data step created), and you only can use the guessingrows= statement to force SAS to use more rows for guessing.
Best bet, IMO: save from Excel to csv, run proc import once, get the data step, and modify to your needs.
The data step can be found in the log. Copy/paste, remove the line numbers, and then start tweaking.
If information is not sensitive, you can post the data step part of the log here.
This may be more than you asked for here, but I copied it all over:
31 PROC IMPORT DATAFILE="\\americredit.com\db_files\AOC\SA\AR\Analytics\Reports\Sources\Bid Data.csv" OUT=BidData
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.15 seconds
cpu time 0.09 seconds
NOTE: WORK.ACTIVE_DEALERS data set was successfully created.
NOTE: The data set WORK.ACTIVE_DEALERS has 4228 observations and 17 variables.
32 REPLACE;
33 GUESSINGROWS=32767;
34 GETNAMES=yes;
35
Number of names found is less than number of variables found.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Problems were detected with provided names. See LOG.
36 PROC SQL;
37 /**********************************************************************
38 * PRODUCT: SAS
39 * VERSION: 9.3
40 * CREATOR: External File Interface
41 * DATE: 21NOV16
42 * DESC: Generated SAS Datastep Code
43 * TEMPLATE SOURCE: (None Specified.)
44 ***********************************************************************/
2 The SAS System 08:14 Monday, November 21, 2016
45 data WORK.BIDDATA ;
46 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
47 infile '\\americredit.com\db_files\AOC\SA\AR\Analytics\Reports\Sources\Bid Data.csv' delimiter = ',' MISSOVER DSD
47 ! lrecl=32767 firstobs=2 ;
48 informat VIN $17. ;
49 informat YEAR best32. ;
50 informat MAKE $9. ;
51 informat MODEL $32. ;
52 informat SERIES $1. ;
53 informat MILEAGE $8. ;
54 informat BUY_PRICE $8. ;
55 informat BID_DATE mmddyy10. ;
56 informat YEAR_MONTH $1. ;
57 informat BID_ITERATION $1. ;
58 informat NUMBER_OF_BIDS $1. ;
59 informat BID_AMOUNT $8. ;
60 informat BID_SITE $23. ;
61 informat BIDDER_NAME $1. ;
62 informat BIDDER_EMAIL $1. ;
63 informat BIDDER_ORGANIZATION_NAME $53. ;
64 informat BIDDER_DEALER_CODE best32. ;
65 informat BAC $1. ;
66 informat BIDDER_STATE $2. ;
67 informat BIDDER_CITY $14. ;
68 informat BIDDER_ZIP best32. ;
69 informat SOLD $1. ;
70 informat ITERATION_SOLD best32. ;
71 informat SOLD_WEBSITE $23. ;
72 informat SALE_DATE mmddyy10. ;
73 informat FPOC_ELIGIBLE $1. ;
74 informat FPOC_SALE $1. ;
75 informat _ $1. ;
76 informat VAR29 $1. ;
77 informat VAR30 $1. ;
78 informat VAR31 $1. ;
79 informat VAR32 $1. ;
80 informat VAR33 $1. ;
81 informat VAR34 $1. ;
82 informat VAR35 $1. ;
83 informat VAR36 $1. ;
84 informat VAR37 $1. ;
85 informat VAR38 $1. ;
86 informat VAR39 $1. ;
87 informat VAR40 $1. ;
88 informat VAR41 $1. ;
89 format VIN $17. ;
90 format YEAR best12. ;
91 format MAKE $9. ;
92 format MODEL $32. ;
93 format SERIES $1. ;
94 format MILEAGE $8. ;
95 format BUY_PRICE $8. ;
96 format BID_DATE mmddyy10. ;
97 format YEAR_MONTH $1. ;
98 format BID_ITERATION $1. ;
99 format NUMBER_OF_BIDS $1. ;
100 format BID_AMOUNT $8. ;
101 format BID_SITE $23. ;
3 The SAS System 08:14 Monday, November 21, 2016
102 format BIDDER_NAME $1. ;
103 format BIDDER_EMAIL $1. ;
104 format BIDDER_ORGANIZATION_NAME $53. ;
105 format BIDDER_DEALER_CODE best12. ;
106 format BAC $1. ;
107 format BIDDER_STATE $2. ;
108 format BIDDER_CITY $14. ;
109 format BIDDER_ZIP best12. ;
110 format SOLD $1. ;
111 format ITERATION_SOLD best12. ;
112 format SOLD_WEBSITE $23. ;
113 format SALE_DATE mmddyy10. ;
114 format FPOC_ELIGIBLE $1. ;
115 format FPOC_SALE $1. ;
116 format _ $1. ;
117 format VAR29 $1. ;
118 format VAR30 $1. ;
119 format VAR31 $1. ;
120 format VAR32 $1. ;
121 format VAR33 $1. ;
122 format VAR34 $1. ;
123 format VAR35 $1. ;
124 format VAR36 $1. ;
125 format VAR37 $1. ;
126 format VAR38 $1. ;
127 format VAR39 $1. ;
128 format VAR40 $1. ;
129 format VAR41 $1. ;
130 input
131 VIN $
132 YEAR
133 MAKE $
134 MODEL $
135 SERIES $
136 MILEAGE $
137 BUY_PRICE $
138 BID_DATE
139 YEAR_MONTH $
140 BID_ITERATION $
141 NUMBER_OF_BIDS $
142 BID_AMOUNT $
143 BID_SITE $
144 BIDDER_NAME $
145 BIDDER_EMAIL $
146 BIDDER_ORGANIZATION_NAME $
147 BIDDER_DEALER_CODE
148 BAC $
149 BIDDER_STATE $
150 BIDDER_CITY $
151 BIDDER_ZIP
152 SOLD $
153 ITERATION_SOLD
154 SOLD_WEBSITE $
155 SALE_DATE
156 FPOC_ELIGIBLE $
157 FPOC_SALE $
158 _ $
159 VAR29 $
4 The SAS System 08:14 Monday, November 21, 2016
160 VAR30 $
161 VAR31 $
162 VAR32 $
163 VAR33 $
164 VAR34 $
165 VAR35 $
166 VAR36 $
167 VAR37 $
168 VAR38 $
169 VAR39 $
170 VAR40 $
171 VAR41 $
172 ;
173 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
174 run;
NOTE: The infile '\\americredit.com\db_files\AOC\SA\AR\Analytics\Reports\Sources\Bid Data.csv' is:
Filename=\\americredit.com\db_files\AOC\SA\AR\Analytics\Reports\Sources\Bid Data.csv,
RECFM=V,LRECL=32767,File Size (bytes)=450976,
Last Modified=21Nov2016:09:32:47,
Create Time=21Nov2016:09:32:36
NOTE: 2187 records were read from the infile '\\americredit.com\db_files\AOC\SA\AR\Analytics\Reports\Sources\Bid Data.csv'.
The minimum record length was 178.
The maximum record length was 234.
NOTE: The data set WORK.BIDDATA has 2187 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
2187 rows created in WORK.BIDDATA from \\americredit.com\db_files\AOC\SA\AR\Analytics\Reports\Sources\Bid Data.csv.
NOTE: WORK.BIDDATA data set was successfully created.
NOTE: The data set WORK.BIDDATA has 2187 observations and 41 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 1.87 seconds
cpu time 1.81 seconds
175 SELECT BD.BIDDER_ORGANIZATION_NAME,
176 BD.VIN,
177 BD.YEAR,
178 BD.MAKE,
179 BD.MODEL,
180 BD.MILEAGE,
181 BD.BUY_PRICE,
182 BD.BID_AMOUNT,
183 S.Sale_Price AS Purchase_Price_Cognos,
184 BD.BID_DATE,
185 BD.BID_SITE,
186 BD.BIDDER_DEALER_CODE,
187 BD.BIDDER_STATE,
188 BD.BIDDER_ZIP,
189 BD.ITERATION_SOLD,
190 BD.SOLD_WEBSITE,
@TMiller16 wrote:
This may be more than you asked for here, but I copied it all over:
31 PROC IMPORT DATAFILE="\\americredit.com\db_files\AOC\SA\AR\Analytics\Reports\Sources\Bid Data.csv" OUT=BidData
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.15 seconds
cpu time 0.09 seconds
NOTE: WORK.ACTIVE_DEALERS data set was successfully created.
NOTE: The data set WORK.ACTIVE_DEALERS has 4228 observations and 17 variables.
32 REPLACE;
33 GUESSINGROWS=32767;
34 GETNAMES=yes;
35
Number of names found is less than number of variables found.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
Name _ is a duplicate.
The "Number of names found is less than number of variables found" is most common when you have missing column headings but indications of data. In the case of CSV this often happens when editing an EXCEL file and "deleted" columns are exported with every row ending in ,,,,,,,, or similar. If you do not actually have any values in the columns indicated by the names VARx then you may simply remove them from the Informat, format and input statements. If there actual data an the column heading is missing then you name them to something meaningful. When a bunch of these appear as the last items in the informat with $1. then it is almost assured that this is the case. Note: your original Proc Import when using this approach should use a large value for the guessingrows parameter such as 5000 (it is okay to try to use more than there are lines in the data).
Price with an informat of $8 means that you likely have $ signs in the value. If the Excel sheet is using a currency display then you might try setting that column to simple numeric before exporting it to get that as a numeric.
The log indicates that the excel sheet is not a data sheet, but a report sheet with all kinds of fancy formatting. Therefore the columns that have no headers, the empty columns, numerics as character, and so on.
Either you get the provider to send you data with correctly ordered, named and formatted columns, or you best start from scratch, look at the file as text, and start decoding it. Something we have done when importing IBM mainframe usage statistics reports into SAS.
Very tedious, but as you can clearly see, proc import can't really deal with this unformatted mess.
Generally with EXCEL formats such as xls or xlsx you cannot specify the resulting data type. Search the forum for Excel and data types and you will find this is a very common issue and why you see repeated references not to use Excel files for data interchange. If you are receiving files in the same format over time it is almost certain that at some time in the process if you use Proc Import that the data will have changes in data types and the length of character variables will change.
Personally, since I have no control over the garbage my clients throw at me, I convert everything from Excel file formats to CSV and then use a Datastep to read them so I have consistent variable types and lengths of character variables and variable names that make sense and short enough to code with (also so they don't change when someone is 'helpful' and changes column headings from file to file)
Try This:
PROC SQL print NUMBER;
CREATE TABLE ORA_TBL_List as
SELECT *
FROM DICTIONARY.TABLES
where libname = "SASHELP" and
( (memname like "CARS%") OR
(memname like "CLASS%") ) ;
QUIT;
proc print data=ora_tbl_list;
run;
Look for the last two columns Num_Characters and Num_Numeric. It lists wealth of information.
Change your Library and MemName in the above code.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.