BookmarkSubscribeRSS Feed
purfield
Calcite | Level 5

Hi Im using the following code to import a large data set, it imports and the header etc is correct but I just have a problem with my second column width- it comes in truncated so Im missing half of it.So the second part of the code-importing the imp part.

I cant seem to find anything to format it to  $32. I would prefer to not list all the column headings as the file is quite big so would be scrolling for far to long in the sas code. I would  even change all columns to format $32.- is there some alteratation with * that allows the format of all columns to be changed? I have tried a simple proc import, its just the first 20 lines wouldnt be as long as the end lines so they become truncated and would preferably not want to increase this using guessing rows as it seems to be very slow and its a long dataset so guessing rows needs to be incredibly large.

Thanks in advance!

%let location=U:\abc\efg\hij\;

%macro imp;

%do i =29 %to 29;

data _NULL_;

   call symput('imp', compress("'"||"&location."||"firsthalf"||'.txt'||" ' "));

   call symput('header', compress("'"||"&location."||'header.txt'||" ' "));

run;

quit;

data _null_;

  infile &header. delimiter = ' ' pad

TRUNCOVER DSD lrecl=900000 obs=1;

  file 'invar.sas';

  length thisvar $32797;

  do i=1 to 900000;

    input thisvar @;

    if not missing(thisvar) then put thisvar "$";

  end;

run;

data pairs&i;

%let _EFIERR_ = 0;

infile &imp. delimiter = ' '

MISSOVER DSD lrecl=900000 firstobs=2 ;

input

  %inc 'invar.sas';

  ;

if _ERROR_ then call symputx('_EFIERR_',1); 

run;

%end;

%mend imp;

%imp;

7 REPLIES 7
ballardw
Super User

Instead of data _null_ for the first step actually save the data such as data work.temp.

You would then be able to get a list of the values of thisvar to use in the second step.

data work.temp;

  infile &header. delimiter = ' ' pad

TRUNCOVER DSD lrecl=900000 obs=1;

  file 'invar.sas';

<etc>

run;

proc sql noprint;

     select thisvar into :varlist separated by ' ' /* there is a space within the quotes to generate a space delimited list Format and informat statements like*/

     from work.temp;

quit;

data pairs&i;

%let _EFIERR_ = 0;

infile &imp. delimiter = ' '

MISSOVER DSD lrecl=900000 firstobs=2 ;

informat &varlist $32.;

format &varlist $32.;/* this is actually redundant as informat will set this when reading data*/

input

  %inc 'invar.sas';

  ;

if _ERROR_ then call symputx('_EFIERR_',1); 

run;

Tom
Super User Tom
Super User

You should use LENGTH or ATTRIB with LENGTH= option to define the LENGTH of a variable.

Assigning a format or an informat to it would only define the length as a side effect.

Tom
Super User Tom
Super User

What is the question exactly? The code does not match the description.

The code appears to be reading variable names from 'header.txt' and using those to read 'firstpart.txt'.

Are you trying to define a variable with a name longer than 32 characters?  That is not possible. You could assign labels that are up to 256 characters long.

The description seems to imply that you want to pre-read all of the variable values and find the maximum length for each variable so that you can then define the lengths of the variables.

purfield
Calcite | Level 5

Hi,

Apologies I wasnt clear enough. So my question is, how do I get my second column to be $32 characters in length. The first part of the code is bringing in the titles- these all come in in the correct format (majority  19 characters in length) and the second column comes in with a heading $14 characters long. The second part of the code brings in the data which is placed under the titles, this is primarily numeric and short and all comes in correctly apart from the second column. This column is character based-  and seems to stop after 8 characters in length so it becomes truncated 'abcdefgh' instead of 'abcdefghijklmno'. This second column also increases in character length further down the file so say from abc at the start to abcdefghijkmnop at the end of the file.There are over a million rows of data.

I had tried proc import but I found it far to slow as I need to change guessing rows to go as far as the end of the file so guessingrows=1000000.

So I was wondering is there a way to define the columns to be so long in length.  I do think its something with length, informat etc but I cant seem to get it right.

Thanks for your help so far.

Tom
Super User Tom
Super User

Sounds like you are saying that you want the second column to be defined as character and all of the others as numeric.

In that case it would probably be better to have your first data step generate LENGTH statement to define the type for each variable.

Then the input statement can be simplified to just use a name range.

%let location=U:\abc\efg\hij\;

%macro imp;

%local imp header firstvar lastvar ;

%let imp=%sysfunc(quote(&locatiion.firsthalf.txt));

%let header=%sysfunc(quote(&location.header.txt));

filename code temp ;

data _null_;

  infile &header DSD DLM=' ' TRUNCOVER LRECL=900000 OBS=1;

  length thisvar $32797;

  file code lrecl=80 ;

  put 'length ' ;

  do i=1 to 900000 until (thisvar=' ');

    input thisvar @;

    if i=1 then call symputx('firstvar',thisvar);

    if thisvar ne ' ' then do;

       call symputx('lastvar',thisvar);

       put thisvar @;

       if i=2 then put '$32 ' @ ;

       else put '8 ' @;

    end;

  end;

  put ';' ;

  stop;

run;

%let _EFIERR_ = 0;

data pairs&i;

  infile &imp DSD DLM=' ' TRUNCOVER LRECL=900000 FIRSTOBS=2 ;

%inc code / source2 ;

  input &firstvar -- &lastvar ;

  if _ERROR_ then call symputx('_EFIERR_',1);

run;

%mend imp;

%imp;

ballardw
Super User

If I understood your code you are not reading any numeric values. They may look like numbers but I'm betting that they are character as your INVAR.SAS file should look like:

var1 $

Var2 $

Var3 $

...

Telling SAS to read them as character.

Your column heading lengths have nothing to do with the length that will be assigned to the values of variables. SAS will tend to default undefined character variable lengths to 8.

Here's an idea:

options obs=3;

proc import (the data file);

don't bother about the guessing rows.

Recall the generated code (F4 key) or copy from the log

modify the code. You can then set lengths, informats, formats for the data.

If your files are of the same structure then you can reuse the code pointing to the new input and output sets.

reset obs back to max when done reading the first file.

And HOW long is too long? I processed about 52million records in proc import this morning and it took a little over one minute. I can certainly wait a minute or two instead of spending hours to get odd code to work.

jakarman
Barite | Level 11

Tom we agree this time.  The question is a strange one, not getting the reason to do it that way.

Issue:

Either purfield has a dataset with unpredictable content coming in.

(s)he could use proc import as it designed for that. It does reading the header line guessing variable types by reading a limited number of lines. Base SAS(R) 9.4 Procedures Guide, Third Edition There could be some limitation (s)he has hit, would nice to hear that.

Or (s)he has a big dataset with a predictable content coming in. Forget the first line with processing. Use that one to code a one time generated fixed way of SAS input processing. 

Additional

That statement of the second line is too short.  IT could be

-  some encoding issue and typical control character as a hard newline coming form Excel

- A dataset with multiple types of inputrecords. In this case you will need a more sophisticated input statement 

When op's will share something it is the best tot do that as attachements here. 

Remarks:

- using the _infile_ automatic variable gives the option to access the input-line.  Using the list statement some help for debugging

- missover dropover lrelc and other infile options could be helpful for the original question.     

- The use of writing out some sas-code and reading that back in is an old nice way to solve dynamic coding. A more elegant approach would be SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition using call execute.

---->-- ja karman --<-----

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1675 views
  • 0 likes
  • 4 in conversation