BookmarkSubscribeRSS Feed
Solph
Pyrite | Level 9
I wanted to read it an external dataset (>100,000 records). It is tab delimited, with missing variables entered as “.”. I could read in fine if I use dataline. (The data was aligned fine in SAS, but appear misaligned here.
data want;
input Year 1-4 date 6-14 Type $ 16-20 ID1 $ 22-30 ID2 $32-41 Status1 $ 43 status2 $ 45-46 amount 47-51;
datalines;
2008 . ACUT1 580035688 0093509933 A EP 1
2009 01-Jun-08 ACUT2 506920875 7294393694 B FP 7
2009 13-Apr-09 ACUT3 506925992 7863803113 C AC 10
;
run;

But if I read in from an external file, it is giving me error messages. The first row has variable names. I tried using firstobs=2 or removing the first line, but the problem still didn’t go away. I also tried missover or not specifying column numbers, etc., and still didn't help.

DATA want;
INFILE "D:\data.txt" ;
input Year 1-4 date date9. 6-14 Type $ 16-20 ID1 $ 22-30 ID2 $32-41 Status1 $ 43 status2 $ 45-46 amount 47-51;
run;
**************************************************************;
Data in the external file "Data.txt"
Year date type ID1 ID2 Status1 Status2 amount
2008 . ACUT1 580035688 0093509933 A EP 1
2009 01-Jun-08 ACUT2 506920875 7294393694 B FP 7
2009 13-Apr-09 ACUT3 506925992 7863803113 C AC 10


Thanks very much in advance.
9 REPLIES 9
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Please clarify "the problem still didn’t go away" - share your actual SAS log output for a run where you had firstobs=2. Also, consider adding either or both of these SAS debugging statements for some self-initiated desk-checking:

LIST;


PUTLOG _ALL_;



Scott Barry
SBBWorks, Inc.
Patrick
Opal | Level 21
Hi

You're saying your data is tab delimited but the code you've posed uses fixed width.

May be you have to amend your infile statement.

With SAS EG you could use the import wizard (file/import data) which should help you to create valid code. Just make sure that in the first screen of the import wizard (EG 4.3, not sure about other versions) you click button "performance..." and then check "limit amount of source data examined..." and choose something like a 100 lines. Else the whole file will be analysed which could take a while.

You can then in later screens changes the length, format and informat of your variables according to what you expect them to be.

HTH
Patrick
Patrick
Opal | Level 21
Duplicate post removed
Ksharp
Super User
How about:


INFILE "D:\data.txt" expandtabs firstobs=2;


Ksharp
Solph
Pyrite | Level 9
Just had time to thanks you all for the replies.

Tried Infile ... EXPANDTABS FIRSTOBS=2 and it didn't work.
Tried importing using SAS EG (I think mine is 4.3), but I didn't see the option of "performance..." and "limit amount of source data examined...".

I found a way to read in data from tab delimited files using DELIMITER='09'x though.

DATA want;
LENGTH type $5. ID1 $9. ID2 $10. date $9.;
INFILE "D:\data.txt" DELIMITER='09'x FIRSTOBS=2 MISSOVER;
INPUT Year date $ Type $ ID1 $ ID2 $ Status1 $ status2 $ amount;
RUN;

There are a few issues though.
1. In SAS Base, defining DATE as a string variable is the only way I could read in the data without any error and MISSOVER in or out doesn't matter. I tried in the INPUT statement DATE date9., but it gave me errors instead.
2. In SAS EG (available at work), if I recall correctly it I could define DATE as a date format fine, but it gave me warning or error messages on records with missing date variable. Fortunately there are only 17 cases and I just gave them up.

If anyone has quick answers, let me know. Otherwise I'll just live with the errors. It's quite exhausting reading in data. Thanks.
Patrick
Opal | Level 21
Hi

You probabely should use an INFORMAT to read your date strings.

input date:date9. .....;

or:
data ....
attrib date format=date9. informat=date9.;

input date .....


For cases where there is no data in a cell and you might have two consecutive tabs. In order to avoid issues you might want to add 'DSD' to your infile statement:

INFILE "D:\data.txt" DSD DELIMITER='09'x FIRSTOBS=2 TRUNCOVER;


HTH
Patrick
Solph
Pyrite | Level 9
Thanks Both worked and there is no need to specify variable length or to use DSD TRUNCOVER.
1)
DATA want;
format date date9.;
INFILE "D:\data.txt" DELIMITER='09'x FIRSTOBS=2 MISSOVER;
INPUT Year date:date9. Type $ ID1 $ ID2 $ Status1 $ status2 $ amount;
RUN;
2)
DATA want;
ATTRIB date format=date9. informat=date9.;
INFILE "D:\data.txt" DELIMITER='09'x FIRSTOBS=2 MISSOVER;
INPUT Year date $ Type $ ID1 $ ID2 $ Status1 $ status2 $ amount;
RUN;

I have another question though. For some reasons if the value for AMOUNT exceeds 999, the data becomes a comma style with quotes wrapped around say "13,099" instead of just 13099. Is there is a quick way to fix it? Or I'll have to open the text file to replace " with Null and use the comma format(?)? Or read in as a text variable and get rid of the quotes and commas before converting it into a numerical variable?
Ksharp
Super User

DSD option will take care of  this situation.

Ksharp

Tom
Super User Tom
Super User

Add the DSD option on the INFILE statement to handle the quotes.

Use the COMMA informat for the AMOUNT variable to handle the embedded commas.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1275 views
  • 0 likes
  • 5 in conversation