BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ArtemisFowl
Calcite | Level 5

Hello,

I have a flat file consisting of a whole lot of names. It is of the form:

"Adam","Robin","Jimmy","Aragon",...

and so forth,

so the file only contains one line of names.

I'd like to read this file into a data set, such that each name gets its own entry (ideally without the quotation marks). Apparently the string of names is longer than 32767 characters, which appear to be a maximum for proc import. Can I somehow circumvent this?

Kind regards.

1 ACCEPTED SOLUTION

Accepted Solutions
MikeZdeb
Rhodochrosite | Level 12

Hi ... this worked for me with your data ...

data new;

infile 'z:\names.txt' dsd lrecl=50000 pad;

input name : $15. @@;

run;


NOTE: 1 record was read from the infile 'z:\names.txt'.

      The minimum record length was 46447.

      The maximum record length was 46447.

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

NOTE: The data set WORK.NEW has 5163 observations and 1 variables.

first 5 observations ...

MARY

PATRICIA

LINDA

BARBARA

ELIZABETH

View solution in original post

11 REPLIES 11
MikeZdeb
Rhodochrosite | Level 12

Hi ... this worked for me with your data ...

data new;

infile 'z:\names.txt' dsd lrecl=50000 pad;

input name : $15. @@;

run;


NOTE: 1 record was read from the infile 'z:\names.txt'.

      The minimum record length was 46447.

      The maximum record length was 46447.

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

NOTE: The data set WORK.NEW has 5163 observations and 1 variables.

first 5 observations ...

MARY

PATRICIA

LINDA

BARBARA

ELIZABETH

ArtemisFowl
Calcite | Level 5

Hello, Mike.

That works perfectly, thank you. It seems the dsd statement does the trick, I will look into that one.

MikeZdeb
Rhodochrosite | Level 12

hi ... that DSD option  means a few things

file is comma-delimited

two consecutive commas are interpreted as a missing value

strip quotes

ps not "THE" Artemis Fowl I presume

ArtemisFowl
Calcite | Level 5

Thanks. Smiley Happy

THE Artemis Fowl would not be asking for help on an online forum. Or would he...?

chang_y_chung_hotmail_com
Obsidian | Level 7

Here is an alternative approach -- reading one character at a time.

    %let pwd = z:\; 
    data names;
      infile "&pwd\names.txt" recfm=n unbuffered eof=output;
      length name $20;
      do while (1);
        input c $1. @@;
        if c = "," then link output;
        else name = catt(name,c);
      end;
      stop;
    
      output:
        name = dequote(name);
        output;
        name = "";
        keep name;
      return;
    run;
    
    /* check. first and last three names */
    ods _all_ close;
    ods listing;
       title "first three names";
       proc print data=names(obs=3);
       run;
    
       title "last three names";
       proc print data=names(firstobs=5161 obs=5163);
       run
    
       title;
    ods listing close;
    /* on Results
    first three names
    Obs    name
       1    MARY   
       2    PATRICIA
       3    LINDA  
    
    last three names
    Obs    name
    5161    DARELL  
    5162    BRODERICK
    5163    ALONSO
    */

MikeZdeb
Rhodochrosite | Level 12

hi ... another way to look at those first and last three names ...

data _null_;

do obs=1 to 3, lastrec-2 to lastrec;

   set new point=obs nobs=lastrec;

   put obs= name=;

end;

stop;

run;


obs=1 name=MARY

obs=2 name=PATRICIA

obs=3 name=LINDA

obs=5161 name=DARELL

obs=5162 name=BRODERICK

obs=5163 name=ALONSO

Ksharp
Super User

Mike,

I don't realize you can use a value of LRECL are greater than 32767 . Maybe it is a new feature?


Thanks.

Ksharp

MikeZdeb
Rhodochrosite | Level 12

Hi Ksharp ... on an INFILE statement in Windows ....

LRECL=record-length

specifies the record length (in bytes). Under Windows, the default is 256. The value of record-length can range from 1 to 1,073,741,823 (1 gigabyte)

This was true even in V9.1 (on page 449) and I'm not sure how far back that's been the case ...

http://support.sas.com/documentation/onlinedoc/91pdf/sasdoc_91/base_hostwin_6974.pdf


Pritish
Quartz | Level 8

You can also use this code to extract the data:

filename names 'H:\Personal Folder\names.txt';

data names;

infile names dlm = ',''"''' lrecl = 50000;

input names : $15. @@;

run;

Geraldo
Fluorite | Level 6

My Example:

data teste;

   infile '/names.txt';

   input @;

   _infile_    = prxchange('s/\"//',-1,_infile_);

   Contador    = countc(_infile_,',');

   Text_Buffer = _infile_;

   Pos_Inic    = 0;

   Pos_Prox    = 0;

   do I=0 to Contador;

       Pos_Prox    = findc(Text_Buffer,',',Pos_Inic +1);

    NAME        = substr(Text_Buffer, Pos_Inic + 1, Pos_Prox - 1);

    Text_Buffer = substr(Text_Buffer, Pos_Prox + 1, length(Text_Buffer));

    output;

  end;

  drop Contador Text_Buffer Pos_Inic Pos_Prox I;

run;

Geraldo
Fluorite | Level 6

Geraldo,

add  RECFM=F and LRECL=32767 ;

thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 5630 views
  • 4 likes
  • 6 in conversation