- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 06-04-2008 09:38 AM
(5525 views)
Hi
I'm importing a .csv data file of around 200 entries, with a lot of variables and many missing values. When I use the code below, 2 strange things happen: One, only some of my lines of data are imported, approximately every other one. Two, the last column in the spreadsheet is entered as if all the data for that variable were missing, when in fact there are few missing data for that variable. No matter which variable I put in that last column, it always does that.
Help!
data soldiers;
infile 'Z:\Desktop\Zootermopsis\Caste morph measurements\caste morphological measurements.csv'
dlm=',' dsd firstobs=2;
length fixative $ 20 colony $ 50 caste $ 20 eyetype $ 25;
input fixative $ cllctnd $ colony $ indiv $ sex $ caste $ whead
hdratio lmand llabrum wlabrum lbrmrtio lovary heye midwovry weye nardtest
eyetype $ testrat wingbud wwovary narwpost wdtestes wwpost eggs lpost lhead ;
run;
I'm importing a .csv data file of around 200 entries, with a lot of variables and many missing values. When I use the code below, 2 strange things happen: One, only some of my lines of data are imported, approximately every other one. Two, the last column in the spreadsheet is entered as if all the data for that variable were missing, when in fact there are few missing data for that variable. No matter which variable I put in that last column, it always does that.
Help!
data soldiers;
infile 'Z:\Desktop\Zootermopsis\Caste morph measurements\caste morphological measurements.csv'
dlm=',' dsd firstobs=2;
length fixative $ 20 colony $ 50 caste $ 20 eyetype $ 25;
input fixative $ cllctnd $ colony $ indiv $ sex $ caste $ whead
hdratio lmand llabrum wlabrum lbrmrtio lovary heye midwovry weye nardtest
eyetype $ testrat wingbud wwovary narwpost wdtestes wwpost eggs lpost lhead ;
run;
14 REPLIES 14
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi gloopy. 🙂
Messages from the log may be helpful. Check to see if you are getting something like: "SAS went to a new line when input reached past end of line". If so I am guessing the default lrecl is 256. And it looks like the line you are reading may be longer than that. On your infile statement add lrecl=512 before the semicolon.
Messages from the log may be helpful. Check to see if you are getting something like: "SAS went to a new line when input reached past end of line". If so I am guessing the default lrecl is 256. And it looks like the line you are reading may be longer than that. On your infile statement add lrecl=512 before the semicolon.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In the log, the error that it gives is copied below. All of that first stuff is some of my data, mixed in with the "ZONE" and "NUMR" things...
This error is repeated many times, each with different locations listed for the lhead variable 'invalid data'.
It doesn't have the error message you mentioned though. I added in the lrecl statement you suggested and it didn't seem to help--the only change i saw was that SAS only read 55 lines of data instead of 103.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2 pampel's,4/20/00,Zn2-9aC,,m,reproductive soldier,0.336,1.131,0.284,0.0775,0.09,0.8611,,0
89 .0425,,0.025,0.0688,unsclerotized,1.3455,0,,0.0675,0.0925,0.1063,,0.255,0.38.pampel's,4/
ZONE 2333322323332323333276766676767662323333232232333323233332323333223233323233076676627232
NUMR E0425CC0E025C0E0688C5E33C52F49A54C1E3455C0CC0E0675C0E0925C0E1063CC0E255C0E38D01D05C73C4F
177 18/00,Zn2-9,,m,reproductive soldier,0.3063,1.1939,0.2656,0.0738,0.0875,0.8429,,0
fixative=pampel's colony=Zn2-9aC caste=reproductive soldier eyetype=unsclerotized cllctnd=4/20/00
indiv= sex=m whead=0.336 hdratio=1.131 lmand=0.284 llabrum=0.0775 wlabrum=0.09 lbrmrtio=0.8611
lovary=. heye=0.0425 midwovry=. weye=0.025 nardtest=0.0688 testrat=1.3455 wingbud=0 wwovary=.
narwpost=0.0675 wdtestes=0.0925 wwpost=0.1063 eggs=. lpost=0.255 lhead=. _ERROR_=1 _N_=1
NOTE: Invalid data for lhead in line 3 178-192.
This error is repeated many times, each with different locations listed for the lhead variable 'invalid data'.
It doesn't have the error message you mentioned though. I added in the lrecl statement you suggested and it didn't seem to help--the only change i saw was that SAS only read 55 lines of data instead of 103.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2 pampel's,4/20/00,Zn2-9aC,,m,reproductive soldier,0.336,1.131,0.284,0.0775,0.09,0.8611,,0
89 .0425,,0.025,0.0688,unsclerotized,1.3455,0,,0.0675,0.0925,0.1063,,0.255,0.38.pampel's,4/
ZONE 2333322323332323333276766676767662323333232232333323233332323333223233323233076676627232
NUMR E0425CC0E025C0E0688C5E33C52F49A54C1E3455C0CC0E0675C0E0925C0E1063CC0E255C0E38D01D05C73C4F
177 18/00,Zn2-9,,m,reproductive soldier,0.3063,1.1939,0.2656,0.0738,0.0875,0.8429,,0
fixative=pampel's colony=Zn2-9aC caste=reproductive soldier eyetype=unsclerotized cllctnd=4/20/00
indiv= sex=m whead=0.336 hdratio=1.131 lmand=0.284 llabrum=0.0775 wlabrum=0.09 lbrmrtio=0.8611
lovary=. heye=0.0425 midwovry=. weye=0.025 nardtest=0.0688 testrat=1.3455 wingbud=0 wwovary=.
narwpost=0.0675 wdtestes=0.0925 wwpost=0.1063 eggs=. lpost=0.255 lhead=. _ERROR_=1 _N_=1
NOTE: Invalid data for lhead in line 3 178-192.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try the MISSOVER option in the INFILE statement. Missover helps if there is missing values.
If that doesn't work try the TRUNCOVER option in the INFILE statement. I believe that TRUNCOVER works similar to MISSOVER but is a bit different in the details. And of course you can't use both.
-Darryl
If that doesn't work try the TRUNCOVER option in the INFILE statement. I believe that TRUNCOVER works similar to MISSOVER but is a bit different in the details. And of course you can't use both.
-Darryl
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hmm, neither missover nor truncover seem to change anything either.... 😕
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you may have a data problem or two.
Unless I am reading your log wrong, i see the lhead=0.38. which is not a "numeric" the 2 periods will not work.
Also, It looks like there is no end of line marker since 0.38.pampel's run into each other
-Darryl
Unless I am reading your log wrong, i see the lhead=0.38. which is not a "numeric" the 2 periods will not work.
Also, It looks like there is no end of line marker since 0.38.pampel's run into each other
-Darryl
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The log looks like you don't have any end-of line marker in the .csv file to separate the different records. Look at the "pampel's" at the end of the second data line; that looks like the same fixative as in the beginning of the first line.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Besides missover, I would also try the pad options on the infile statement
infile 'Z:\Desktop\Zootermopsis\Caste morph measurements\caste morphological measurements.csv'
dlm=',' dsd firstobs=2 pad;
infile 'Z:\Desktop\Zootermopsis\Caste morph measurements\caste morphological measurements.csv'
dlm=',' dsd firstobs=2 pad;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry - hit the post button too soon.
You need to use pad in conjuction with the lrecl= on the infile statement.
One other thought, you may want to inspect the csv file in notepad for any escape characters. I seem to remember vaguely a similiar issue with a csv file. I believe I fixed the problem by globally searching (and replacing) the identified character in the csv file
You need to use pad in conjuction with the lrecl= on the infile statement.
One other thought, you may want to inspect the csv file in notepad for any escape characters. I seem to remember vaguely a similiar issue with a csv file. I believe I fixed the problem by globally searching (and replacing) the identified character in the csv file
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are you getting the data from Business Objects by chance?
When I used to get BO files, I had to put a dummy variable at the end to control the loss as I could never find the troublesome characters with notepad.
When I used to get BO files, I had to put a dummy variable at the end to control the loss as I could never find the troublesome characters with notepad.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have a look at the termstr= option for the infile statement. If the csv-file has been created by a Windows application the termstr should be set to 'CRLF', because Unix only expects the LF as line-separator the CR becomes part of the content.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can anyone help with the following error? The same error repeats for every other line of data. For the most part it says the invalid data is the length of the character values up to the space. The first variable is $30.
NOTE: Invalid data for run in line 3 1-3.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
2 ABDR 0.5_12.5_0.5_20 ABDR 0.5/12.5/0.5/20
87 00000000000000 N 77777726724 19970101
173 1010314AWP
259 19990801 20391231 1.89000PLACEBOS
345 ** 1.000
431 0 0.000DummyVar
3 ADV BALANCED NUTR SUPP ADV BALANCED NUTR SUPP
87 00000000000000 21130078170 20000901
173 O 1010314AWP
259 20000901 20391231 0.01000PLACEBOS
345 ** 0.000
431 0 0.000DummyVar
Any ideas?
NOTE: Invalid data for run in line 3 1-3.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
2 ABDR 0.5_12.5_0.5_20 ABDR 0.5/12.5/0.5/20
87 00000000000000 N 77777726724 19970101
173 1010314AWP
259 19990801 20391231 1.89000PLACEBOS
345 ** 1.000
431 0 0.000DummyVar
3 ADV BALANCED NUTR SUPP ADV BALANCED NUTR SUPP
87 00000000000000 21130078170 20000901
173 O 1010314AWP
259 20000901 20391231 0.01000PLACEBOS
345 ** 0.000
431 0 0.000DummyVar
Any ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your forgot a semi-colon at the end of your run statement see the error message "invalide data for run..."
-Darryl
-Darryl
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Darryl.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Okay, I solved my problem. I didn't have a ' ; ' after the list of variables in the input statement. SAS thought my next line "run;" was another variable without a length and type of variable associated with it.