BookmarkSubscribeRSS Feed
TMorville
Calcite | Level 5
Hi. I have a problem that i can't seem to google myself out of. I've searched the forums, but without luck.

I have a folder with a bunch of txt files called:

wea-abo_02-03-2011.txt
wea-abo_05-02-2011.txt
.
.
.

and so on.

The variables are seperated by a single tab. They are always the same, and have the same names. One would think that i could use a infile statement, but i cant make it work!

What i would like to do, is to import all of the .txt files into one big SAS file. IF possible, i would like SAS only to include, the first row only of the first workbook. This is the variable names, and its only nessesary for me to have them once.

There are 58 variables, and each textfile is approx 50000 rows long. There are some text variables, and some numerical. I also have variables with 8 empty rows in the beginning.

Any help or ideas would be appriciated.

Thanks! Message was edited by: TMorville
15 REPLIES 15
Oleg_L
Obsidian | Level 7
Hello.
Try this macro. It works on Windows. Edit some marked lines of code.

[pre]
%macro cfin10(input=d:\dep\input\2009, out=obordop2009);

/* read files in directory INPUT */

%let dir=%str(dir %")&input.%str(\%" /A-D/B/ON);

data &out ;
length filevar $256. command $256. ;
command = symget('dir');
infile dummy0 pipe filevar=command truncover;
input myfiles $100.;
fname=quote(upcase(cats("&input",'\',myfiles)));

/* Edit below line of code - delimiter and firstobs */


infile dummy1 firstobs=2 dlm=';' ls=32000 truncover dsd filevar=fname end=eof;

/* edit below line of code according to your data structure (fields) */

informat account $20. ss_naim $100. var1-var5 $1. ss_ostd 17. ss_ostc 17. ost_ccy 17. country $50. iso $3. client $100.
k_client $12. client_t $1. inn $char12. reg $char30. isin $char20. cb $char30.;
do while(not eof);

/* edit below line of code according to your data structure (fields) */

input account ss_naim var1-var5 ss_ostd ss_ostc ost_ccy country iso client
k_client client_t inn reg isin cb ;
end;
run;

%mend cfin10;
[/pre] Message was edited by: Oleg_L
TMorville
Calcite | Level 5
Hi Oleg and thanks for the input. It dosen't work tough.

Im working on getting this code to work:

*******************************************************

filename indata pipe 'dir I:\WEA\WEA-Marketing\Analyse\Data\P10\Dump\Txt /b';
data file_list;
length fname $25;
infile indata truncover; /* infile statement for file names */
input fname $25.; /* read the file names from the directory */
call symput ('num_files',_n_); /* store the record number in a macro variable */
run;


%macro fileread;
%do j=1 %to &num_files;
data _null_;
set file_list;
if _n_=&j;
call symput ('filein',fname);
run;
data var_names;
length x1-x60 $50;
infile "I:\WEA\WEA-Marketing\Analyse\Data\P10\Dump\Txt\&filein" obs=1 missover;
input (x1-x16) ($) ;
run;


%macro varnames;
%do i=1 %to 60;
%global v&i;
data _null_;
set var_names;
call symput("v&i",trim(x&i));
run;
%end;
%mend varnames;
%varnames;


data temp;
infile "I:\WEA\WEA-Marketing\Analyse\Data\P10\Dump\Txt\&filein" firstobs=2 missover;
input (&v1 &v2 &v3 &v4 &v5 &v6 &v7 &v8 &v9 &v10 &v11 &v12 &v13 &v14 &v15 &v16 &v17 &v18
&v19 &v20 &v21 &v22 &v23 &v24 &v25 &v26 &v27 &v28 &v29 &v30 &v31 &v32 &v33 &v34 &v35 &v36
&v37 &v38 &v39 &v40 &v41 &v42 &v43 &v44 &v45 &v46 &v47 &v48 &v49 &v50 &v51 &v52 &v53 &v54
&v55 &v56 &v57 &v58 &v59 &v60)($);
run;

/* assemble the individual files */
%if &j=1 %then %do;
data data_all;
set temp;
run;
%end;
%else %do;
data data_all;
set data_all
temp;
run;
%end;
%end; /* end of do-loop with index j */
%mend fileread;
%fileread;

**************************************************

It gives me some output, but it screws up in the second row, because its text and not numeric.

Any ideas?
Oleg_L
Obsidian | Level 7
I can suggest you another way. See code below.
I don't know your data structure. You did not provide the data here.
You should edit "dlm=", "firstobs=", path to your files and "informat" and "input" according to the information in your text files.

[pre]
data test ;
length filename $256.;
infile 'd:\mbm\2009\D20091106\input\*.txt' firstobs=3 dlm=';' ls=32000 truncover dsd eov=eov filename=filename;
informat account $20. ss_naim $100. var1-var5 $1. ss_ostd 17. ss_ostc 17. ost_ccy 17. country $50. iso $3. client $100.
k_client $12. client_t $1. inn $char12. reg $char30. isin $char20. cb $char30.;

input @;
if eov then do ;
input /;
eov = 0 ;
end;

input account ss_naim var1-var5 ss_ostd ss_ostc ost_ccy country iso client
k_client client_t inn reg isin cb ;
file=filename;
run;
[/pre]
TMorville
Calcite | Level 5
Hi again Oleg.

The problem with your code is the non-flexible $variable lenghts.

B/c i have names, adresses and many different categories and lenghts for my variables, it wont work.

Here is a modified sample of my data with 20 rows:

http://dl.dropbox.com/u/1321324/Work/sample.txt

Thanks! Message was edited by: TMorville
Oleg_L
Obsidian | Level 7
Try this code. You may want to edit variables informat.

[pre]
%macro cfin10(input=d:\dep\input\2009, out=obordop2009);
/* read files in directory INPUT */
%let dir=%str(dir %")&input.%str(\%" /A-D/B/ON);
data &out (compress=yes drop=filevar);
length filevar $256. command $256. fname $256.;

command = symget('dir');
infile dummy0 pipe filevar=command truncover;
input myfiles $100.;
fname=quote(upcase(cats("&input",'\',myfiles)));
infile dummy1 firstobs=2 dlm='09'x ls=32000 truncover dsd filevar=fname end=eof ;

informat

ship_to_party $10.
publication $5.
edition $2.
oplagskat $10.
subscr_valid_from ddmmyy10.
bill_freq $10.
pmethod $10.
r_code $10.
ordernr $9.
fremtidig_aktiv $10.
fornavn $50.
efternavn $70.
vej $70.
husnr $10.
husnr_suppl $5.
etage $1.
side $10.
postnr 17.
by $50.
land $2.
tlf 17.
bp_kat $1.
orgnavn1 $50.
orgnavn2 $50.
orgnavn3 $50.
orgnavn4 $10.
EMail $50.
Marketing_campaign $10.
ship_robinson $10.
sold_to_party $10.
sold_fornavn $50.
sold_efternavn $50.
sold_vej $50.
sold_husnr 17.
sold_husnr_suppl $1.
sold_etage $1.
sold_side $1.
sold_postnr 17.
sold_by $50.
sold_land $2.
sold_tlf 17.
sold_bp_kat $1.
sold_orgnavn1 $50.
sold_orgnavn2 $50.
sold_orgnavn3 $50.
sold_postbox $50.
sold_email $50.
sold_robinson $50.
samme $2.
tom_variabel2 $10.
del_end $10.
endreason $10.
matriculationdate $10.
suspension_from $10.
suspension_to $10.
Suspension_reason $10.
Delivery_type $1.
pricegroup $10.
;


do while(not eof);
input
ship_to_party
publication
edition
oplagskat
subscr_valid_from
bill_freq
pmethod
r_code
ordernr
fremtidig_aktiv
fornavn
efternavn
vej
husnr
husnr_suppl
etage
side
postnr
by
land
tlf
bp_kat
orgnavn1
orgnavn2
orgnavn3
orgnavn4
EMail
Marketing_campaign
ship_robinson
sold_to_party
sold_fornavn
sold_efternavn
sold_vej
sold_husnr
sold_husnr_suppl
sold_etage
sold_side
sold_postnr
sold_by
sold_land
sold_tlf
sold_bp_kat
sold_orgnavn1
sold_orgnavn2
sold_orgnavn3
sold_postbox
sold_email
sold_robinson
samme
tom_variabel2
del_end
endreason
matriculationdate
suspension_from
suspension_to
Suspension_reason
Delivery_type
pricegroup
;
if not missing(ship_to_party) then output;
end;
run;

%mend cfin10;





%cfin10(input=d:\work\morv, out=tmorville);

[/pre]

Message was edited by: Oleg_L

Message was edited by: Oleg_L Message was edited by: Oleg_L
Ksharp
Super User
Hi. Maybe you need another option.
[pre]
infile "I:\WEA\WEA-Marketing\Analyse\Data\P10\Dump\Txt\&filein" firstobs=2 missover expandtabs ;
[/pre]


Or Alternative way is to use SAS default proc import which will avoid to make some mistake.


Ksharp
Peter_C
Rhodochrosite | Level 12
when you know the column structure, IMPORT is seldom the best alternative
Here, it can be very simple
I created 5 copys of the text named DK1 to DK5.txt which are read with the 6 lines
data ;
infile 'C:\Users\Peter Crawford\Documents\My SAS Files\dk*.txt' dsd dlm='09'x lrecl=10000 firstobs=2 eov=eov truncover ;
input @ ;
if eov then do; eov=0 ; delete ; end ;
else input (col1 - col33 )(:$30);
run;
The EOV flag is set to 1 when the input rolls over from one file to the next (and needs to be reset)./
You adapt the data step for column names, lengths and date or other special (in)formats and if you keep the column definitions in order, the main input statement becomes even simpler[pre] else input ship_to_party -- pricegroup ;[/pre]That structure is
input first_column -- last_column ;
which will include all intermediate columns too. The will be read with their default informats.
Ksharp
Super User
Hi.
Peter. I know data step have more control to input data and more flexible.
I think OP maybe miss the option dlm='09'x or expandtabs to generate the unwanted
result.

For you wildchar about infile statement, there is one disadvantage that you are hard to control which txt file to input firstly,that mean the order of data maybe you do not want.
I think OP use infile dummy filevar= is good enough. 🙂


Ksharp
Peter_C
Rhodochrosite | Level 12
K
expandtabs might help less (as it removes what appear to be reliable delimiters.);
I saw no request to control input file order (When that is important, I normally seek to add the name of each file that is being loaded, with the data using FILENAME=);
I have found more overheads in the FILEVAR= method than in the * wildchar method.

Hence my recommendation

P
Ksharp
Super User
Dear Peter:
I do not know what to say. 🙂
I think OP 's problem maybe miss dlm='09'x option.

Ksharp
TMorville
Calcite | Level 5
Hi guys and thanks for the replys! It warms my SAS-padawan heart do learn stuff like this.

I like Peter.C solution, as it seems most flexible. But i can't really get it to work. My code looks like this now:

This is my errorlog:


347 data ;
348 infile 'I:\WEA\WEA-Marketing\Analyse\Data\P10\Dump\Txt\we*.txt' dsd dlm='09'x lrecl=10000
348! firstobs=2 eov=eov truncover ;
349 input ship_to_party -- pricegroup; ;
ERROR: Variable ship_to_party cannot be found on the list of previously defined variables.
350 if eov then do; eov=0 ; delete ; end ;
351 run;

Theres a chance i've misunderstood this sentance about inputting (in)formats?

PS: While im at it, i've gotten Oleg method to work (Thanks!) - and that causes me a new problem. If i have a variable thats a string of text like "wea-abo_29-01-2011.txt" can SAS recognize the date in the text, and translate it into a sas date like 29JAN2011?

Thanks! Message was edited by: TMorville
Oleg_L
Obsidian | Level 7
-> PS: While im at it, i've gotten Oleg method to work (Thanks!) - and that causes me a new problem. If i have a variable thats a string of text like "wea-abo_29-01-2011.txt" can SAS recognize the date in the text, and translate it into a sas date like 29JAN2011?

Add before the line " if not missing(ship_to_party) then output;"
next code:
date=input(substr(left(myfiles),9,10),ddmmyy10.);

Oleg.
TMorville
Calcite | Level 5
Hi again Oleg.

I have another problem with your code. No matter how many lines there are in the .txt document. Your code only imports 37508? Any ideas?


But thanks 🙂

Message was edited by: TMorville Message was edited by: TMorville
Oleg_L
Obsidian | Level 7
Try to replace " if not missing(ship_to_party) then output;" with "output;"
or another guess is to add option "ignoredoseof" to the infile statement.
But I'm not sure.

infile dummy1 firstobs=2 dlm='09'x ls=32000 truncover dsd filevar=fname end=eof ignoredoseof ;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1569 views
  • 0 likes
  • 4 in conversation