Hi,
Hoping someone can help and correct the logic I am using. THank you in advance 😉
Trying to import a csv file that is separated by semicolon and the field names start on row 2 and data starts row 3
csv file shoes_startrow.csv (attached):
syntax:
proc import datafile="C:\Users\Other Request\shoes_startrow.csv"
out=Shoe_File
dbms=csv replace;
GETNAMES=NO;
DELIMITER=';';
DATAROW=2;
run;
result:
want:
Region | Product | Subsidiary | Stores | Sales | Inventory | Returns |
Africa | Boot | Addis Ababa | 12 | $29,761 | $191,821 | $769 |
Africa | Men's Casual | Addis Ababa | 4 | $67,242 | $118,036 | $2,284 |
Africa | Men's Dress | Addis Ababa | 7 | $76,793 | $136,273 | $2,433 |
Africa | Sandal | Addis Ababa | 10 | $62,819 | $204,284 | $1,861 |
Africa | Slipper | Addis Ababa | 14 | $68,641 | $279,795 | $1,771 |
Africa | Sport Shoe | Addis Ababa | 4 | $1,690 | $16,634 | $79 |
Africa | Women's Casual | Addis Ababa | 2 | $51,541 | $98,641 | $940 |
Africa | Women's Dress | Addis Ababa | 12 | $108,942 | $311,017 | $3,233 |
Try this:
proc import datafile="C:\Users\Other Request\shoes_startrow.csv"
out=Shoe_File
dbms=csv replace;
GETNAMES=YES;
DELIMITER=';';
DATAROW=3;
run;
Don't let PROC IMPORT mess up your data when reading text files. Write the data step yourself with FIRSTOBS=3 in the INFILE statement.
This would be the code to read your csv file:
data want;
infile "C:\Users\Other Request\shoes_startrow.csv" dlm=";" dsd firstobs=3 truncover;
input
region :$25.
product :$14.
subsidiary :$12.
stores
sales :dollar12.
inventory :dollar12.
returns :dollar12.
;
format
sales :dollar12.
inventory :dollar12.
returns :dollar12.
;
run;
PS in order to supply an example of your data which we can use for testing, either attach the csv to your post, or open it with a text editor and copy/paste a sufficient number of lines into a window opened with this button:
Do not post data as pictures, especially not as screenshots from Excel.
Hi @Kurt_Bremser - sorry I thought I did attached the csv file.
That file is even more messed up that you originally suggested.
11 data _null_; 12 infile "c:\downloads\shoes_startrow.csv"; 13 input; 14 list; 15 run; NOTE: The infile "c:\downloads\shoes_startrow.csv" is: Filename=c:\downloads\shoes_startrow.csv, RECFM=V,LRECL=32767,File Size (bytes)=30195, Last Modified=06Jul2022:09:28:09, Create Time=06Jul2022:09:28:09 RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 1 xxxxx;;;;;;,,,,,, 17 2 Region;Product;Subsidiary;Stores;Sales;Inventory;Returns,,,,,, 62 3 "Africa;Boot;Addis Ababa;12;""$29;761"";""$191;821"";$769",,,,,, 64 4 "Africa;Men's Casual;Addis Ababa;4;""$67;242"";""$118;036"";""$2;284""",,,,,, 77 5 "Africa;Men's Dress;Addis Ababa;7;""$76;793"";""$136;273"";""$2;433""",,,,,, 76 6 "Africa;Sandal;Addis Ababa;10;""$62;819"";""$204;284"";""$1;861""",,,,,, 72
In addition the the extra line at the top and the extra line at the bottom every line is corrupted. Instead of having 7 values separated by semicolons most of the lines have one value followed by bunch of commas.
Are you sure this is your ORIGINAL file? Or did you accidentally let EXCEL automatically open the CSV file and then save that modified version of the file back as a CSV file? In general you should never let EXCEL automatically open a CSV file (other than for simple exploring) as it will make decisions about how the convert the text into a spreadsheet including changing some of the "cell" values.
If the file is like this then first convert it to a real CSV file and then read that. You could remove the header and trailer lines at the same time. Since the resulting file is not using commas as the delimiter let's not use CSV as the extension on the file.
filename have 'myfile.csv';
filename want 'myfile_fixed.txt';
data _null_;
infile have dsd truncover end=eof;
file want;
input line :$1000. ;
if _n_=1 or eof then putlog _n_= _infile_;
else put line;
run;
So your file is not a semicolon-separated pseudo-csv (as created by Excel), but a real comma separated volume with all the information in the first column, which is enclosed in double quotes, is internally separated by semicolons and also contains double quotes. And some of the numbers have semicolons instead of commas. In short, an unbelievable mess. Who threw this piece of computing crap at you?
Anyway, this code reads your file successfully, but I include it only to show what SAS is capable of. You should not try to make sense out of this file, return it to sender, with a solid nastygram attached to it.
data want;
infile "~/shoes_startrow.csv" dlm="," dsd firstobs=3 truncover;
length
region $25
product $14
subsidiary $12
stores 8
sales 8
inventory 8
returns 8
;
format
sales
inventory
returns dollar12.
;
input line :$100.;
if not index(line,"row count");
region = scan(line,1,";");
product = scan(line,2,";");
subsidiary = scan(line,3,";");
stores = input(scan(line,4,";"),best.);
_sales = compress(scan(line,5,";") !! scan(line,6,";"),'"');
sales = input(_sales,dollar20.);
_inventory = compress(scan(line,7,";") !! scan(line,8,";"),'"');
inventory = input(_inventory,dollar20.);
_returns = compress(scan(line,9,";") !! scan(line,10,";"),'"');
returns = input(_returns,dollar20.);
drop _: line;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.