BookmarkSubscribeRSS Feed
dom_oh
Calcite | Level 5

Hi there

I need to import multiple (20+) comma delimited text files (.txt files not .csv) of varying lengths and number of variables. A "memberkey" column links all these files together. However, this variable has leading zeros, what's worse is that the member keys vary in length. For example there are two ids 000123 and 00123 which when imported are both 123 when that is clearly not the case.

I have written/modified a macro I found on the sas website to import all these files as they are in a common directory. This works great apart from the leading zero issue. I can't use an infile data step as I don't know all the field names and don't want to have to open up these files (too many of them).

My question is can I force proc import to somehow make all fields called "memberkey" text instead of numeric?

Here is my macro:

filename DIRLIST pipe 'dir "\data\client\folder\Raw Data\*.txt" ';

data dirlist ;

infile dirlist lrecl=200 truncover;

input line $200.;

if find(line,".txt") = 0 then delete;

length file_name $ 200;

file_name="\data\client\folder\Raw Data\"||scan(line,-1," ");

keep file_name;

run;

data _null_;

set dirlist end=end;

count+1;

call symput('read'||left(count),left(trim(file_name)));

call symput('dset'||left(count),substr(file_name, 76, length(file_name) - 79));

if end then call symput('max',count);

run;

options mprint symbolgen;

%macro readin;

%do i=1 %to &max;

PROC IMPORT DATAFILE="&&read&i"

OUT=data.&&dset&iv

DBMS=DLM

REPLACE;

GETNAMES=YES;

DELIMITER=",";

GUESSINGROWS=10000;

RUN;

%end;

%mend readin;

%readin;

Thank you all for your help

9 REPLIES 9
SASKiwi
PROC Star

My guess is the PROC IMPORT is treating the the offending column as numeric and is dropping the leading zeroes as a consequence.
I did a test myself with this program:

proc import out= test

  datafile= "c:\test_data.txt"

  REPLACE;

  GETNAMES=YES;

   DELIMITER=",";

RUN;

And then I looked in the log and found this:

/**********************************************************************

*   PRODUCT:   SAS

*   VERSION:   9.3

*   CREATOR:   External File Interface

*   DATE:      23JUL12

*   DESC:      Generated SAS Datastep Code

*   TEMPLATE SOURCE:  (None Specified.)

***********************************************************************/

    data WORK.TEST    ;

    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

    infile 'c:\test_data.txt' delimiter = ',' MISSOVER DSD

lrecl=32767 firstobs=2 ;

       informat ID best32. ;

       informat _Text_var $14. ;

       format ID best12. ;

       format _Text_var $14. ;

    input

                ID

                _Text_var $

    ;

    if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable

*/

    run;

Note the highlighted INFORMAT and FORMAT statements showing the ID is being read as a number.

I suggest you look at your LOG as it will have similar code. You can copy this code back into your program by using your mouse with the ALT key pressed to avoid the LOG row numbers to highlight the required statements then do a CTRL C and CTRL V to get it into your program. You can then edit the highlighted statements to be read and displayed as character (don't forget to put a $ sign after the ID column).

I realise this doesn't solve the problem of multiple files with different columns but it will confirm this is your problem.

dom_oh
Calcite | Level 5

@saskiwi Thanks for your reply I apprectiate it. I was aware of that as a solution. However, that wont suit my purposes as I don't want to have to do that for all the files I have, its too manual. I am trying to create an efficient automated process.

Ksharp
Super User

You can use a very convenient way(variables list ) to make them all be character.

filename DIRLIST "c:\data\client\folder\Raw Data\*.txt" ;

data dirlist ;

infile dirlist lrecl=200 truncover;

input var1-var2000 : $200.;

...........

Ksharp

dom_oh
Calcite | Level 5

@ksharp thanks for replying, That doesn't work for me. All the vars (1:1999) are numeric only var2000 is character. Furthermore, if I did get it working I wouldn't have meaningful variable names and I would have a large number of empty columns. Do you know a way around that?

Ksharp
Super User

All the vars (1:1999) are numeric only var2000 is character.

First of all you need to estimate the maximum number of all the files. and you can change it to be numberic by using array.

Furthermore, if I did get it working I wouldn't have meaningful variable names.

That is not a problem. I used to do it usually. You can always use proc datasets to modify them.Such as:

data varname;

input varname : $40. @@ ;

cards;

sds sdsa kjll  lkj  lkjls sds

;

run;

data _null_;

set varname end=last ;

length rename $ 32767;

retain rename;

rename=catx(' ',  rename ,'var',_n_,'=',varname);

if last then call symputx('rename',rename);

run;

%put &rename ;

proc datasets library=work nolist;

modify have;

rename &rename  ;

quit;

I would have a large number of empty columns.

That is easy. You can use SQL do it. reference my previous example.

KSharp

andreas_lds
Jade | Level 19

Untested and not efficient:

%macro readin;

%do i=1 %to &max;

PROC IMPORT DATAFILE="&&read&i"

OUT=data.&&dset&iv

DBMS=DLM

REPLACE;

GETNAMES=YES;

DELIMITER=",";

GUESSINGROWS=10000;

RUN;

/* new code: */

data data.&&dset&iv;

     length MemberKey $ 12; /* insert maximum length */

     set data.&&dset&iv(rename=(memberkey = MemberKeyNum));

     MemberKey = put(MemberKeyNum, z12.); /* fix format length */

     drop MemberKeyNum;

run;

%end;

%mend readin;

%readin;

I recommend moving the import proc to the data step.

dom_oh
Calcite | Level 5

Thanks for replying, unfortunately that solution will not work for me because the member keys are of varying lengths to begin with. So if a key is 000123 and another 00123 when padded out to the maximum length again both keys will become 000123.  It's pretty much the most frustrating data I have had to deal with given there are so many tables to import! Thanks for the suggestion

Patrick
Opal | Level 21

One work around I've seen (I believe in a SAS note) is to add an additional line of data to your .txt file. That could be a string of commas and then you add some text string on the position where you need Proc Import to create a character variable. I believe you need to add "mode=mixed" or some similar syntax to the Proc Import to make this approach work.

Would that be a possible solution for you? It's just about figuring out on which position (after how many commas) you need to add the text string. If I understand right then you have a first header row and you know the column name - so you could just count after how many commas this column name exists in the header row and that gives you the line of text you have to add to your text file (make sure it's withing the guessingrows area).

Tom
Super User Tom
Super User

Why not just re-read the text files using a data step to get the memberkey variable correct?

You can use PROC CONTENTS or dictionary.columns or SASHELP.VCOLUMN to get the VARNUM for the variable MEMBERKEY in each of you new SAS datasets.  Once you have that then you can use code like this to re-read the text file and merge it with the SAS dataset.

%let textfile=file1.txt;

%let ds=file1;

%let varnum=3;

data memberkey ;

  infile "&textfile"  dlm=',' dsd truncover firstobs=2 ;

  length memberkey $12;

  do _n_ = 1 to &varnum;

     input memberkey @;

  end;

run;

data &ds ;

  merge memberkey &ds (drop=memberkey);

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 9069 views
  • 0 likes
  • 6 in conversation