Solved: read in txt file in sas meet errors

kyleiii · Posted 08-04-2018 06:27 AM

Hi,

I hope to read in a txt file into SAS and some missing values where the original dataset doesn't have appear in my output dataset.

And the value which should be in the variable Open moves to the next variable.

My code are as below

data delivery_stores_2001;
	length Market_Name $20. MskdName $8.;
	infile "/home/stat440/Delivery_Stores_2001.txt" dlm=' ' firstobs=2;
	input IRI_KEY OU $ EST_ACV Market_Name $ Open Clsd MskdName $;
run;

I have also paste part of the dataset

IRI_KEY OU EST_ACV Market_Name Open Clsd MskdName
200039 GR 9.709999 BUFFALO/ROCHESTER 539 1219 Chain87
200171 GR 27.69099 MILWAUKEE 522 9998 Chain97
200197 GR 11.14 PEORIA/SPRINGFLD. 903 9998 Chain59
200233 GR 7.514999 OKLAHOMA CITY 1122 1150 Chain102
200272 GR 13.173 LOS ANGELES 873 9998 Chain124
200287 GR 7.481998 SAN FRANCISCO 795 9998 Chain94
200297 GR 23.159 PORTLAND,OR 999 9998 Chain79
200334 GR 19.21899 PORTLAND,OR 922 1329 Chain137
200411 GR 18.94199 NEW YORK 1014 9998 Chain66
200439 GR 14.098 WEST TEX/NEW MEX 920 9998 Chain110
200479 GR 18.545 SAN FRANCISCO 559 1362 Chain7
200485 GR 5.781998 OKLAHOMA CITY 490 9998 Chain102
200488 GR 26.06699 BOSTON 519 1325 Chain107
200489 GR 16.02299 HOUSTON 701 9998 Chain34
200492 GR 11.876 LOS ANGELES 473 9998 Chain124
200494 GR 20.028 GREEN BAY 486 1435 Chain115
200499 GR 9.598999 OKLAHOMA CITY 490 9998 Chain102
200501 GR 29.267 SAN FRANCISCO 554 9998 Chain7
200510 GR 31.10899 HARTFORD 473 1210 Chain114
200511 GR 19.21899 WEST TEX/NEW MEX 456 1145 Chain39

Am I using the wrong delimiter? Is there anything I could do to fix the problem? Thanks!

FreelanceReinh · Posted 08-04-2018 08:20 AM

Assuming that the delimiters are blanks, you could try this INPUT statement:

input IRI_KEY OU $ EST_ACV Market_Name & Open Clsd MskdName;

As there are plenty of blanks (?) after the market name, it could be sufficient to take care of the embedded blanks by using the '&' modifier. (The $ signs after variable names Market_Name and MskdName are optional because of their declaration in the LENGTH statement.)

View solution in original post

Kurt_Bremser · Posted 08-04-2018 07:14 AM

Open the file with a good text editor (eg notepad++) and view it in hex mode to see which delimiters are used.

If the delimiter character also appears in data, and the data is not quoted, you will face some programming effort.

Ideally, such a file would use tabs ('09'x) or some other rare character as delimiter and blanks in the market_name.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

kyleiii · Posted 08-04-2018 07:25 AM

Thanks for your reply!

I have open the txt in notepad++, and I could not recognize whether it uses the blank or tab as delimiter, can you help me for check?

FreelanceReinh · Posted 08-04-2018 08:20 AM

Assuming that the delimiters are blanks, you could try this INPUT statement:

input IRI_KEY OU $ EST_ACV Market_Name & Open Clsd MskdName;

As there are plenty of blanks (?) after the market name, it could be sufficient to take care of the embedded blanks by using the '&' modifier. (The $ signs after variable names Market_Name and MskdName are optional because of their declaration in the LENGTH statement.)

kyleiii · Posted 08-04-2018 08:27 AM

Thank you so much! Really appreciate for that!

I have google online and try many ways, but it doesn't work.

The dataset now looks great!

FreelanceReinh · Posted 08-04-2018 08:39 AM

Wonderful! Actually, these raw data (with fixed column widths) are well suited for formatted INPUT or column INPUT, but if list INPUT was already successful, it's fine.

Kurt_Bremser · Posted 08-04-2018 08:38 AM

You have fixed columns, which allows to use column input:

data want;
infile ....;
input
  iri_key $ 2-7
  ou $ 9-10
  est_acv 12-19
  market_name $ 21-44
  open 46-49
  clsd 51-54
  mskdname $ 56-62
;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 08-04-2018 07:42 AM

Why not use PROC IMPORT ?

data delivery_stores_2001;
	length OU Market_Name  MskdName $ 40 ;
	infile "/home/stat440/Delivery_Stores_2001.txt" dlm=' ' firstobs=2 termstr=CRLF;
	input IRI_KEY OU $ EST_ACV Market_Name $ Open Clsd MskdName $;
run;

kyleiii · Posted 08-04-2018 07:56 AM

I have only learned about how to read data using data step.

But I still stuck on import the data correctly although I figure out the delimiter.

I just google and try for proc import

proc import datafile="/home/stat440/Delivery_Stores_2001.txt" out=mydata  dbms=dlm;
delimiter=' ';
run;

The log said import cancelled, I am confused and do you have any suggestions?

data_null__ · Posted 08-05-2018 12:10 PM

You don't know the length of market_name and it is followed by only one space (list input needs 2 spaces to read a field with space). You do know that there are three fields after market_name and you can use that info to calculate the length of market_name;

filename FT15F001 temp;
data delivery_stores_2001;
	length Market_Name $20. MskdName $8.;
	infile FT15F001 firstobs=2 col=c;
	input IRI_KEY OU $ EST_ACV @;
	call scan(_infile_,-3,p,l,' ');
	* CALL SCAN(<string>, count, position, length <, <character-list> <, <modifier(s)>>>);

        l=p-c-1;
	input Market_Name $varying. l  Open Clsd MskdName $;
   parmcards;
IRI_KEY OU EST_ACV Market_Name Open Clsd MskdName
200039 GR 9.709999 BUFFALO/ROCHESTER 539 1219 Chain87
200171 GR 27.69099 MILWAUKEE 522 9998 Chain97
200197 GR 11.14 PEORIA/SPRINGFLD. 903 9998 Chain59
200233 GR 7.514999 OKLAHOMA CITY 1122 1150 Chain102
200272 GR 13.173 LOS ANGELES 873 9998 Chain124
200287 GR 7.481998 SAN FRANCISCO 795 9998 Chain94
200297 GR 23.159 PORTLAND,OR 999 9998 Chain79
200334 GR 19.21899 PORTLAND,OR 922 1329 Chain137
200411 GR 18.94199 NEW YORK 1014 9998 Chain66
200439 GR 14.098 WEST TEX/NEW MEX 920 9998 Chain110
200479 GR 18.545 SAN FRANCISCO 559 1362 Chain7
200485 GR 5.781998 OKLAHOMA CITY 490 9998 Chain102
200488 GR 26.06699 BOSTON 519 1325 Chain107
200489 GR 16.02299 HOUSTON 701 9998 Chain34
200492 GR 11.876 LOS ANGELES 473 9998 Chain124
200494 GR 20.028 GREEN BAY 486 1435 Chain115
200499 GR 9.598999 OKLAHOMA CITY 490 9998 Chain102
200501 GR 29.267 SAN FRANCISCO 554 9998 Chain7
200510 GR 31.10899 HARTFORD 473 1210 Chain114
200511 GR 19.21899 WEST TEX/NEW MEX 456 1145 Chain39
;;;;
   run;
proc print;
   run;

FreelanceReinh · Posted 08-05-2018 12:28 PM

@data_null__: I think you were misled by the OP's first post where it seemed that Market_Name was followed by only one space. Initially I had been misled the same way and had developed the below workaround before I saw the Notepad++ screenshot in the OP's second post.

input @;
_infile_=prxchange('s/(\S+ \S+ \S+ \D+)/$1 /',1,left(compbl(_infile_)));
input IRI_KEY OU $ EST_ACV Market_Name & Open Clsd MskdName;

data_null__ · Posted 08-05-2018 12:32 PM

@FreelanceReinh I saw all the posts. I was not misled. My program works for both scenarios.

FreelanceReinh · Posted 08-05-2018 12:39 PM

I see. Always good to code defensively. Indeed, even in a scenario with fixed column widths the double blank would not necessarily be guaranteed.

read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

Re: read in txt file in sas meet errors

SAS Innovate 2025: Call for Content

Classroom Training Available!