BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi

I'm importing a .csv data file of around 200 entries, with a lot of variables and many missing values. When I use the code below, 2 strange things happen: One, only some of my lines of data are imported, approximately every other one. Two, the last column in the spreadsheet is entered as if all the data for that variable were missing, when in fact there are few missing data for that variable. No matter which variable I put in that last column, it always does that.

Help!

data soldiers;
infile 'Z:\Desktop\Zootermopsis\Caste morph measurements\caste morphological measurements.csv'
dlm=',' dsd firstobs=2;
length fixative $ 20 colony $ 50 caste $ 20 eyetype $ 25;
input fixative $ cllctnd $ colony $ indiv $ sex $ caste $ whead
hdratio lmand llabrum wlabrum lbrmrtio lovary heye midwovry weye nardtest
eyetype $ testrat wingbud wwovary narwpost wdtestes wwpost eggs lpost lhead ;
run;
14 REPLIES 14
JerryV
SAS Employee
Hi gloopy. 🙂

Messages from the log may be helpful. Check to see if you are getting something like: "SAS went to a new line when input reached past end of line". If so I am guessing the default lrecl is 256. And it looks like the line you are reading may be longer than that. On your infile statement add lrecl=512 before the semicolon.
deleted_user
Not applicable
In the log, the error that it gives is copied below. All of that first stuff is some of my data, mixed in with the "ZONE" and "NUMR" things...
This error is repeated many times, each with different locations listed for the lhead variable 'invalid data'.
It doesn't have the error message you mentioned though. I added in the lrecl statement you suggested and it didn't seem to help--the only change i saw was that SAS only read 55 lines of data instead of 103.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2 pampel's,4/20/00,Zn2-9aC,,m,reproductive soldier,0.336,1.131,0.284,0.0775,0.09,0.8611,,0

89 .0425,,0.025,0.0688,unsclerotized,1.3455,0,,0.0675,0.0925,0.1063,,0.255,0.38.pampel's,4/
ZONE 2333322323332323333276766676767662323333232232333323233332323333223233323233076676627232
NUMR E0425CC0E025C0E0688C5E33C52F49A54C1E3455C0CC0E0675C0E0925C0E1063CC0E255C0E38D01D05C73C4F
177 18/00,Zn2-9,,m,reproductive soldier,0.3063,1.1939,0.2656,0.0738,0.0875,0.8429,,0
fixative=pampel's colony=Zn2-9aC caste=reproductive soldier eyetype=unsclerotized cllctnd=4/20/00
indiv= sex=m whead=0.336 hdratio=1.131 lmand=0.284 llabrum=0.0775 wlabrum=0.09 lbrmrtio=0.8611
lovary=. heye=0.0425 midwovry=. weye=0.025 nardtest=0.0688 testrat=1.3455 wingbud=0 wwovary=.
narwpost=0.0675 wdtestes=0.0925 wwpost=0.1063 eggs=. lpost=0.255 lhead=. _ERROR_=1 _N_=1
NOTE: Invalid data for lhead in line 3 178-192.
darrylovia
Quartz | Level 8
Try the MISSOVER option in the INFILE statement. Missover helps if there is missing values.

If that doesn't work try the TRUNCOVER option in the INFILE statement. I believe that TRUNCOVER works similar to MISSOVER but is a bit different in the details. And of course you can't use both.



-Darryl
deleted_user
Not applicable
hmm, neither missover nor truncover seem to change anything either.... 😕
darrylovia
Quartz | Level 8
I think you may have a data problem or two.

Unless I am reading your log wrong, i see the lhead=0.38. which is not a "numeric" the 2 periods will not work.

Also, It looks like there is no end of line marker since 0.38.pampel's run into each other

-Darryl
Doc_Duke
Rhodochrosite | Level 12
The log looks like you don't have any end-of line marker in the .csv file to separate the different records. Look at the "pampel's" at the end of the second data line; that looks like the same fixative as in the beginning of the first line.
LAP
Quartz | Level 8 LAP
Quartz | Level 8
Besides missover, I would also try the pad options on the infile statement

infile 'Z:\Desktop\Zootermopsis\Caste morph measurements\caste morphological measurements.csv'
dlm=',' dsd firstobs=2 pad;
LAP
Quartz | Level 8 LAP
Quartz | Level 8
Sorry - hit the post button too soon.

You need to use pad in conjuction with the lrecl= on the infile statement.

One other thought, you may want to inspect the csv file in notepad for any escape characters. I seem to remember vaguely a similiar issue with a csv file. I believe I fixed the problem by globally searching (and replacing) the identified character in the csv file
rab24
Calcite | Level 5
Are you getting the data from Business Objects by chance?

When I used to get BO files, I had to put a dummy variable at the end to control the loss as I could never find the troublesome characters with notepad.
andreas_lds
Jade | Level 19
Have a look at the termstr= option for the infile statement. If the csv-file has been created by a Windows application the termstr should be set to 'CRLF', because Unix only expects the LF as line-separator the CR becomes part of the content.
slb
Calcite | Level 5 slb
Calcite | Level 5
Can anyone help with the following error? The same error repeats for every other line of data. For the most part it says the invalid data is the length of the character values up to the space. The first variable is $30.

NOTE: Invalid data for run in line 3 1-3.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
2 ABDR 0.5_12.5_0.5_20 ABDR 0.5/12.5/0.5/20
87 00000000000000 N 77777726724 19970101
173 1010314AWP
259 19990801 20391231 1.89000PLACEBOS
345 ** 1.000
431 0 0.000DummyVar
3 ADV BALANCED NUTR SUPP ADV BALANCED NUTR SUPP
87 00000000000000 21130078170 20000901
173 O 1010314AWP
259 20000901 20391231 0.01000PLACEBOS
345 ** 0.000
431 0 0.000DummyVar

Any ideas?
darrylovia
Quartz | Level 8
Your forgot a semi-colon at the end of your run statement see the error message "invalide data for run..."

-Darryl
slb
Calcite | Level 5 slb
Calcite | Level 5
Thank you Darryl.
slb
Calcite | Level 5 slb
Calcite | Level 5
Okay, I solved my problem. I didn't have a ' ; ' after the list of variables in the input statement. SAS thought my next line "run;" was another variable without a length and type of variable associated with it.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 3498 views
  • 0 likes
  • 8 in conversation