BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kyleiii
Calcite | Level 5

Hi,

I hope to read in a txt file into SAS and some missing values where the original dataset doesn't have appear in my output dataset.

And the value which should be in the variable Open moves to the next variable.

My code are as below

data delivery_stores_2001;
	length Market_Name $20. MskdName $8.;
	infile "/home/stat440/Delivery_Stores_2001.txt" dlm=' ' firstobs=2;
	input IRI_KEY OU $ EST_ACV Market_Name $ Open Clsd MskdName $;
run;

I have also paste part of the dataset

 

IRI_KEY OU EST_ACV Market_Name Open Clsd MskdName
200039 GR 9.709999 BUFFALO/ROCHESTER 539 1219 Chain87
200171 GR 27.69099 MILWAUKEE 522 9998 Chain97
200197 GR 11.14 PEORIA/SPRINGFLD. 903 9998 Chain59
200233 GR 7.514999 OKLAHOMA CITY 1122 1150 Chain102
200272 GR 13.173 LOS ANGELES 873 9998 Chain124
200287 GR 7.481998 SAN FRANCISCO 795 9998 Chain94
200297 GR 23.159 PORTLAND,OR 999 9998 Chain79
200334 GR 19.21899 PORTLAND,OR 922 1329 Chain137
200411 GR 18.94199 NEW YORK 1014 9998 Chain66
200439 GR 14.098 WEST TEX/NEW MEX 920 9998 Chain110
200479 GR 18.545 SAN FRANCISCO 559 1362 Chain7
200485 GR 5.781998 OKLAHOMA CITY 490 9998 Chain102
200488 GR 26.06699 BOSTON 519 1325 Chain107
200489 GR 16.02299 HOUSTON 701 9998 Chain34
200492 GR 11.876 LOS ANGELES 473 9998 Chain124
200494 GR 20.028 GREEN BAY 486 1435 Chain115
200499 GR 9.598999 OKLAHOMA CITY 490 9998 Chain102
200501 GR 29.267 SAN FRANCISCO 554 9998 Chain7
200510 GR 31.10899 HARTFORD 473 1210 Chain114
200511 GR 19.21899 WEST TEX/NEW MEX 456 1145 Chain39

 

Am I using the wrong delimiter? Is there anything I could do to fix the problem? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Assuming that the delimiters are blanks, you could try this INPUT statement:

input IRI_KEY OU $ EST_ACV Market_Name & Open Clsd MskdName;

As there are plenty of blanks (?) after the market name, it could be sufficient to take care of the embedded blanks by using the '&' modifier. (The $ signs after variable names Market_Name and MskdName are optional because of their declaration in the LENGTH statement.)

View solution in original post

12 REPLIES 12
Kurt_Bremser
Super User

Open the file with a good text editor (eg notepad++) and view it in hex mode to see which delimiters are used.

If the delimiter character also appears in data, and the data is not quoted, you will face some programming effort.

Ideally, such a file would use tabs ('09'x) or some other rare character as delimiter and blanks in the market_name.

kyleiii
Calcite | Level 5

Thanks for your reply!

I have open the txt in notepad++, and I could not recognize whether it uses the blank or tab as delimiter, can you help me for check?

 

iooi.png

FreelanceReinh
Jade | Level 19

Assuming that the delimiters are blanks, you could try this INPUT statement:

input IRI_KEY OU $ EST_ACV Market_Name & Open Clsd MskdName;

As there are plenty of blanks (?) after the market name, it could be sufficient to take care of the embedded blanks by using the '&' modifier. (The $ signs after variable names Market_Name and MskdName are optional because of their declaration in the LENGTH statement.)

kyleiii
Calcite | Level 5

Thank you so much! Really appreciate for that!

I have google online and try many ways, but it doesn't work.

The dataset now looks great!

FreelanceReinh
Jade | Level 19

Wonderful! Actually, these raw data (with fixed column widths) are well suited for formatted INPUT or column INPUT, but if list INPUT was already successful, it's fine.

Kurt_Bremser
Super User

You have fixed columns, which allows to use column input:

data want;
infile ....;
input
  iri_key $ 2-7
  ou $ 9-10
  est_acv 12-19
  market_name $ 21-44
  open 46-49
  clsd 51-54
  mskdname $ 56-62
;
run;
Ksharp
Super User

Why not use PROC IMPORT ?

 

data delivery_stores_2001;
	length OU Market_Name  MskdName $ 40 ;
	infile "/home/stat440/Delivery_Stores_2001.txt" dlm=' ' firstobs=2 termstr=CRLF;
	input IRI_KEY OU $ EST_ACV Market_Name $ Open Clsd MskdName $;
run;
kyleiii
Calcite | Level 5

I have only learned about how to read data using data step.

But I still stuck on import the data correctly although I figure out the delimiter.

I just google and try for proc import

proc import datafile="/home/stat440/Delivery_Stores_2001.txt" out=mydata  dbms=dlm;
delimiter=' ';
run;

The log said import cancelled, I am confused and do you have any suggestions? 

data_null__
Jade | Level 19

You don't know the length of market_name  and it is followed by only one space (list input needs 2 spaces to read a field with space).  You do know that there are three fields after market_name and you can use that info to calculate the length of market_name;

 

filename FT15F001 temp;
data delivery_stores_2001;
	length Market_Name $20. MskdName $8.;
	infile FT15F001 firstobs=2 col=c;
	input IRI_KEY OU $ EST_ACV @;
	call scan(_infile_,-3,p,l,' ');
	* CALL SCAN(<string>, count, position, length <, <character-list> <, <modifier(s)>>>);

        l=p-c-1;
	input Market_Name $varying. l  Open Clsd MskdName $;
   parmcards;
IRI_KEY OU EST_ACV Market_Name Open Clsd MskdName
200039 GR 9.709999 BUFFALO/ROCHESTER 539 1219 Chain87
200171 GR 27.69099 MILWAUKEE 522 9998 Chain97
200197 GR 11.14 PEORIA/SPRINGFLD. 903 9998 Chain59
200233 GR 7.514999 OKLAHOMA CITY 1122 1150 Chain102
200272 GR 13.173 LOS ANGELES 873 9998 Chain124
200287 GR 7.481998 SAN FRANCISCO 795 9998 Chain94
200297 GR 23.159 PORTLAND,OR 999 9998 Chain79
200334 GR 19.21899 PORTLAND,OR 922 1329 Chain137
200411 GR 18.94199 NEW YORK 1014 9998 Chain66
200439 GR 14.098 WEST TEX/NEW MEX 920 9998 Chain110
200479 GR 18.545 SAN FRANCISCO 559 1362 Chain7
200485 GR 5.781998 OKLAHOMA CITY 490 9998 Chain102
200488 GR 26.06699 BOSTON 519 1325 Chain107
200489 GR 16.02299 HOUSTON 701 9998 Chain34
200492 GR 11.876 LOS ANGELES 473 9998 Chain124
200494 GR 20.028 GREEN BAY 486 1435 Chain115
200499 GR 9.598999 OKLAHOMA CITY 490 9998 Chain102
200501 GR 29.267 SAN FRANCISCO 554 9998 Chain7
200510 GR 31.10899 HARTFORD 473 1210 Chain114
200511 GR 19.21899 WEST TEX/NEW MEX 456 1145 Chain39
;;;;
   run;
proc print;
   run;

 

 

FreelanceReinh
Jade | Level 19

@data_null__: I think you were misled by the OP's first post where it seemed that Market_Name was followed by only one space. Initially I had been misled the same way and had developed the below workaround before I saw the Notepad++ screenshot in the OP's second post.

input @;
_infile_=prxchange('s/(\S+ \S+ \S+ \D+)/$1 /',1,left(compbl(_infile_)));
input IRI_KEY OU $ EST_ACV Market_Name & Open Clsd MskdName;

 

data_null__
Jade | Level 19

@FreelanceReinh I saw all the posts.   I was not misled.  My program works for both scenarios.

 

FreelanceReinh
Jade | Level 19

I see. Always good to code defensively. Indeed, even in a scenario with fixed column widths the double blank would not necessarily be guaranteed.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1337 views
  • 1 like
  • 5 in conversation