BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ArtemisFowl
Calcite | Level 5

Hello,

I have a flat file consisting of a whole lot of names. It is of the form:

"Adam","Robin","Jimmy","Aragon",...

and so forth,

so the file only contains one line of names.

I'd like to read this file into a data set, such that each name gets its own entry (ideally without the quotation marks). Apparently the string of names is longer than 32767 characters, which appear to be a maximum for proc import. Can I somehow circumvent this?

Kind regards.

1 ACCEPTED SOLUTION

Accepted Solutions
MikeZdeb
Rhodochrosite | Level 12

Hi ... this worked for me with your data ...

data new;

infile 'z:\names.txt' dsd lrecl=50000 pad;

input name : $15. @@;

run;


NOTE: 1 record was read from the infile 'z:\names.txt'.

      The minimum record length was 46447.

      The maximum record length was 46447.

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

NOTE: The data set WORK.NEW has 5163 observations and 1 variables.

first 5 observations ...

MARY

PATRICIA

LINDA

BARBARA

ELIZABETH

View solution in original post

11 REPLIES 11
MikeZdeb
Rhodochrosite | Level 12

Hi ... this worked for me with your data ...

data new;

infile 'z:\names.txt' dsd lrecl=50000 pad;

input name : $15. @@;

run;


NOTE: 1 record was read from the infile 'z:\names.txt'.

      The minimum record length was 46447.

      The maximum record length was 46447.

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

NOTE: The data set WORK.NEW has 5163 observations and 1 variables.

first 5 observations ...

MARY

PATRICIA

LINDA

BARBARA

ELIZABETH

ArtemisFowl
Calcite | Level 5

Hello, Mike.

That works perfectly, thank you. It seems the dsd statement does the trick, I will look into that one.

MikeZdeb
Rhodochrosite | Level 12

hi ... that DSD option  means a few things

file is comma-delimited

two consecutive commas are interpreted as a missing value

strip quotes

ps not "THE" Artemis Fowl I presume

ArtemisFowl
Calcite | Level 5

Thanks. Smiley Happy

THE Artemis Fowl would not be asking for help on an online forum. Or would he...?

chang_y_chung_hotmail_com
Obsidian | Level 7

Here is an alternative approach -- reading one character at a time.

    %let pwd = z:\; 
    data names;
      infile "&pwd\names.txt" recfm=n unbuffered eof=output;
      length name $20;
      do while (1);
        input c $1. @@;
        if c = "," then link output;
        else name = catt(name,c);
      end;
      stop;
    
      output:
        name = dequote(name);
        output;
        name = "";
        keep name;
      return;
    run;
    
    /* check. first and last three names */
    ods _all_ close;
    ods listing;
       title "first three names";
       proc print data=names(obs=3);
       run;
    
       title "last three names";
       proc print data=names(firstobs=5161 obs=5163);
       run
    
       title;
    ods listing close;
    /* on Results
    first three names
    Obs    name
       1    MARY   
       2    PATRICIA
       3    LINDA  
    
    last three names
    Obs    name
    5161    DARELL  
    5162    BRODERICK
    5163    ALONSO
    */

MikeZdeb
Rhodochrosite | Level 12

hi ... another way to look at those first and last three names ...

data _null_;

do obs=1 to 3, lastrec-2 to lastrec;

   set new point=obs nobs=lastrec;

   put obs= name=;

end;

stop;

run;


obs=1 name=MARY

obs=2 name=PATRICIA

obs=3 name=LINDA

obs=5161 name=DARELL

obs=5162 name=BRODERICK

obs=5163 name=ALONSO

Ksharp
Super User

Mike,

I don't realize you can use a value of LRECL are greater than 32767 . Maybe it is a new feature?


Thanks.

Ksharp

MikeZdeb
Rhodochrosite | Level 12

Hi Ksharp ... on an INFILE statement in Windows ....

LRECL=record-length

specifies the record length (in bytes). Under Windows, the default is 256. The value of record-length can range from 1 to 1,073,741,823 (1 gigabyte)

This was true even in V9.1 (on page 449) and I'm not sure how far back that's been the case ...

http://support.sas.com/documentation/onlinedoc/91pdf/sasdoc_91/base_hostwin_6974.pdf


Pritish
Quartz | Level 8

You can also use this code to extract the data:

filename names 'H:\Personal Folder\names.txt';

data names;

infile names dlm = ',''"''' lrecl = 50000;

input names : $15. @@;

run;

Geraldo
Fluorite | Level 6

My Example:

data teste;

   infile '/names.txt';

   input @;

   _infile_    = prxchange('s/\"//',-1,_infile_);

   Contador    = countc(_infile_,',');

   Text_Buffer = _infile_;

   Pos_Inic    = 0;

   Pos_Prox    = 0;

   do I=0 to Contador;

       Pos_Prox    = findc(Text_Buffer,',',Pos_Inic +1);

    NAME        = substr(Text_Buffer, Pos_Inic + 1, Pos_Prox - 1);

    Text_Buffer = substr(Text_Buffer, Pos_Prox + 1, length(Text_Buffer));

    output;

  end;

  drop Contador Text_Buffer Pos_Inic Pos_Prox I;

run;

Geraldo
Fluorite | Level 6

Geraldo,

add  RECFM=F and LRECL=32767 ;

thanks

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 6294 views
  • 4 likes
  • 6 in conversation