Hi,
I hope to read in a txt file into SAS and some missing values where the original dataset doesn't have appear in my output dataset.
And the value which should be in the variable Open moves to the next variable.
My code are as below
data delivery_stores_2001;
length Market_Name $20. MskdName $8.;
infile "/home/stat440/Delivery_Stores_2001.txt" dlm=' ' firstobs=2;
input IRI_KEY OU $ EST_ACV Market_Name $ Open Clsd MskdName $;
run;
I have also paste part of the dataset
IRI_KEY OU EST_ACV Market_Name Open Clsd MskdName
200039 GR 9.709999 BUFFALO/ROCHESTER 539 1219 Chain87
200171 GR 27.69099 MILWAUKEE 522 9998 Chain97
200197 GR 11.14 PEORIA/SPRINGFLD. 903 9998 Chain59
200233 GR 7.514999 OKLAHOMA CITY 1122 1150 Chain102
200272 GR 13.173 LOS ANGELES 873 9998 Chain124
200287 GR 7.481998 SAN FRANCISCO 795 9998 Chain94
200297 GR 23.159 PORTLAND,OR 999 9998 Chain79
200334 GR 19.21899 PORTLAND,OR 922 1329 Chain137
200411 GR 18.94199 NEW YORK 1014 9998 Chain66
200439 GR 14.098 WEST TEX/NEW MEX 920 9998 Chain110
200479 GR 18.545 SAN FRANCISCO 559 1362 Chain7
200485 GR 5.781998 OKLAHOMA CITY 490 9998 Chain102
200488 GR 26.06699 BOSTON 519 1325 Chain107
200489 GR 16.02299 HOUSTON 701 9998 Chain34
200492 GR 11.876 LOS ANGELES 473 9998 Chain124
200494 GR 20.028 GREEN BAY 486 1435 Chain115
200499 GR 9.598999 OKLAHOMA CITY 490 9998 Chain102
200501 GR 29.267 SAN FRANCISCO 554 9998 Chain7
200510 GR 31.10899 HARTFORD 473 1210 Chain114
200511 GR 19.21899 WEST TEX/NEW MEX 456 1145 Chain39
Am I using the wrong delimiter? Is there anything I could do to fix the problem? Thanks!
Assuming that the delimiters are blanks, you could try this INPUT statement:
input IRI_KEY OU $ EST_ACV Market_Name & Open Clsd MskdName;
As there are plenty of blanks (?) after the market name, it could be sufficient to take care of the embedded blanks by using the '&' modifier. (The $ signs after variable names Market_Name and MskdName are optional because of their declaration in the LENGTH statement.)
Open the file with a good text editor (eg notepad++) and view it in hex mode to see which delimiters are used.
If the delimiter character also appears in data, and the data is not quoted, you will face some programming effort.
Ideally, such a file would use tabs ('09'x) or some other rare character as delimiter and blanks in the market_name.
Thanks for your reply!
I have open the txt in notepad++, and I could not recognize whether it uses the blank or tab as delimiter, can you help me for check?
Assuming that the delimiters are blanks, you could try this INPUT statement:
input IRI_KEY OU $ EST_ACV Market_Name & Open Clsd MskdName;
As there are plenty of blanks (?) after the market name, it could be sufficient to take care of the embedded blanks by using the '&' modifier. (The $ signs after variable names Market_Name and MskdName are optional because of their declaration in the LENGTH statement.)
Thank you so much! Really appreciate for that!
I have google online and try many ways, but it doesn't work.
The dataset now looks great!
Wonderful! Actually, these raw data (with fixed column widths) are well suited for formatted INPUT or column INPUT, but if list INPUT was already successful, it's fine.
You have fixed columns, which allows to use column input:
data want;
infile ....;
input
iri_key $ 2-7
ou $ 9-10
est_acv 12-19
market_name $ 21-44
open 46-49
clsd 51-54
mskdname $ 56-62
;
run;
Why not use PROC IMPORT ?
data delivery_stores_2001;
length OU Market_Name MskdName $ 40 ;
infile "/home/stat440/Delivery_Stores_2001.txt" dlm=' ' firstobs=2 termstr=CRLF;
input IRI_KEY OU $ EST_ACV Market_Name $ Open Clsd MskdName $;
run;
I have only learned about how to read data using data step.
But I still stuck on import the data correctly although I figure out the delimiter.
I just google and try for proc import
proc import datafile="/home/stat440/Delivery_Stores_2001.txt" out=mydata dbms=dlm;
delimiter=' ';
run;
The log said import cancelled, I am confused and do you have any suggestions?
You don't know the length of market_name and it is followed by only one space (list input needs 2 spaces to read a field with space). You do know that there are three fields after market_name and you can use that info to calculate the length of market_name;
filename FT15F001 temp;
data delivery_stores_2001;
length Market_Name $20. MskdName $8.;
infile FT15F001 firstobs=2 col=c;
input IRI_KEY OU $ EST_ACV @;
call scan(_infile_,-3,p,l,' ');
* CALL SCAN(<string>, count, position, length <, <character-list> <, <modifier(s)>>>);
l=p-c-1;
input Market_Name $varying. l Open Clsd MskdName $;
parmcards;
IRI_KEY OU EST_ACV Market_Name Open Clsd MskdName
200039 GR 9.709999 BUFFALO/ROCHESTER 539 1219 Chain87
200171 GR 27.69099 MILWAUKEE 522 9998 Chain97
200197 GR 11.14 PEORIA/SPRINGFLD. 903 9998 Chain59
200233 GR 7.514999 OKLAHOMA CITY 1122 1150 Chain102
200272 GR 13.173 LOS ANGELES 873 9998 Chain124
200287 GR 7.481998 SAN FRANCISCO 795 9998 Chain94
200297 GR 23.159 PORTLAND,OR 999 9998 Chain79
200334 GR 19.21899 PORTLAND,OR 922 1329 Chain137
200411 GR 18.94199 NEW YORK 1014 9998 Chain66
200439 GR 14.098 WEST TEX/NEW MEX 920 9998 Chain110
200479 GR 18.545 SAN FRANCISCO 559 1362 Chain7
200485 GR 5.781998 OKLAHOMA CITY 490 9998 Chain102
200488 GR 26.06699 BOSTON 519 1325 Chain107
200489 GR 16.02299 HOUSTON 701 9998 Chain34
200492 GR 11.876 LOS ANGELES 473 9998 Chain124
200494 GR 20.028 GREEN BAY 486 1435 Chain115
200499 GR 9.598999 OKLAHOMA CITY 490 9998 Chain102
200501 GR 29.267 SAN FRANCISCO 554 9998 Chain7
200510 GR 31.10899 HARTFORD 473 1210 Chain114
200511 GR 19.21899 WEST TEX/NEW MEX 456 1145 Chain39
;;;;
run;
proc print;
run;
@data_null__: I think you were misled by the OP's first post where it seemed that Market_Name was followed by only one space. Initially I had been misled the same way and had developed the below workaround before I saw the Notepad++ screenshot in the OP's second post.
input @;
_infile_=prxchange('s/(\S+ \S+ \S+ \D+)/$1 /',1,left(compbl(_infile_)));
input IRI_KEY OU $ EST_ACV Market_Name & Open Clsd MskdName;
@FreelanceReinh I saw all the posts. I was not misled. My program works for both scenarios.
I see. Always good to code defensively. Indeed, even in a scenario with fixed column widths the double blank would not necessarily be guaranteed.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.