Help using Base SAS procedures

Part of the variable names changed when I tried to load a txt file

Accepted Solution Solved
Reply
Contributor
Posts: 26
Accepted Solution

Part of the variable names changed when I tried to load a txt file

Hello all,

I am totally a new user. I tried to load a .txt file into SAS 9.2. It only has 2 lines and 3321 columns. The first line is the variable names (the format is DXXXX_PXX_PXX), and the second line is the variable value.

After I loaded it, it showed that part of the varabile names changed. From No. 2342, the variable name became VARXXXX. SAS log said "Number of names found is less than number of variables found."  Is it because the length of the line is too much for SAS? What should I do to solve it? I don't want to cut the file into several ones and merge them together.

I attached the .txt file. Thank you so much for your kind help!

Elaine

Attachment

Accepted Solutions
Solution
‎10-29-2014 05:08 PM
Respected Advisor
Posts: 3,777

Re: Part of the variable names changed when I tried to load a txt file

This is the program I used to read the data you posted here:

http://dropbox.unl.edu/uploads/20141112/2cb2cc0a5d5e994b/18Fratio.txt

I just ignored the field named "name" it's value is 18F.  I had to increase memory with SAS invovation option "-memsize '12g'  I don't know if that is too much but it worked, my system is UNIX.  I ran the program in batch.  You have to be careful that you don't make SAS "dump" the input record to the LOG that's what all those ?? marks are about.  If SAS dumps those giant records the log is unusable.  I don't know if this is the format you want but it is the only one that makes sense to me.

data names;
   infile FT33F001 dsd dlm='09'x obs=1;
  
length name $32;
  
input name @@;
   if first(name) eq 'R' then do;
     
/*R00000001_D0002/D0001*/
      i = input(scan(name,
1,'R_'),??8.);
      r = input(scan(name,-2,'D/'),??8.);
      c = input(scan(name,-1,'D'),??8.);
      end;
  
run;

==============
NOTE: 1 record was read from the infile FT33F001.
      The minimum record length was
242565850.
      The maximum record length was
242565850.
NOTE: SAS went to a new line when INPUT statement reached past the end of a
line.
NOTE: The data set WORK.NAMES has
11025722 observations and 4 variables.

data _18Fratio;
   infile FT33F001 dsd dlm='09'x firstobs=2;
  
set names;
   input x :?? @@;
   format x 10.6;
  
run;


===============
NOTE: 1 record was read from the infile FT33F001.
      The minimum record length was
99347717.
      The maximum record length was
99347717.
NOTE: There were
11025722 observations read from the data set WORK.NAMES.
NOTE: The data set WORK._18FRATIO has
11025722 observations and 5 variables.

View solution in original post


All Replies
Respected Advisor
Posts: 3,777

Re: Part of the variable names changed when I tried to load a txt file

The LRECL for the name row is 49590.  I think if you are using PROC IMPORT the LRECL defaults to 32767.  The data row is less that that limit at 29904.

Assuming PROC IMPORT use a FILEREF to pass the file info.

FILENAME MYFILE 'path' lrecl=49590.

proc import datafile=MYFILE;

 

I don't know if that will work or not.  Show your SAS LOG.

Contributor
Posts: 26

Re: Part of the variable names changed when I tried to load a txt file

Thank you for your reply. Yes, I used PROC IMPORT to load the file. And you are right, the maximum  length is 32767 characters. From 32768, the varible name changed to VARXXXX.

Sorry I don't know any code in SAS right now. All I want to do now is trying to load my file into SAS and make sure it shows correctly. What should I do?

Here is the log.

1    PROC IMPORT OUT= SASUSER.DISTANCE

2                DATAFILE= "C:\Users\Desktop\sas\18F.txt"

4                DBMS=TAB REPLACE;

5         GETNAMES=YES;

6         DATAROW=2;

7    RUN;

Number of names found is less than number of variables found.

Number of names found is less than number of variables found.

8     /**********************************************************************

9     *   PRODUCT:   SAS

10    *   VERSION:   9.2

11    *   CREATOR:   External File Interface

12    *   DATE:      27OCT14

13    *   DESC:      Generated SAS Datastep Code

14    *   TEMPLATE SOURCE:  (None Specified.)

15    ***********************************************************************/

16       data SASUSER.DISTANCE                             ;

17       %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

18       infile 'C:\Users\CRYSTAL.Crystal-PC\Desktop\meeting today\sas\18F.txt' delimiter='09'x MISSOVER DSD

18 ! lrecl=32767 firstobs=2 ;

19          informat D0001_P02_P01 $9. ;

20          informat D0002_P03_P01 $13. ;

21          informat D0003_P04_P01 $13. ;

22          informat D0004_P05_P01 $13. ;

23          informat D0005_P06_P01 $13. ;

24          informat D0006_P07_P01 $13. ;

25          informat D0007_P08_P01 $13. ;

26          informat D0008_P09_P01 $13. ;

27          informat D0009_P10_P01 $13. ;

28          informat D0010_P11_P01 $13. ;

29          informat D0011_P12_P01 $13. ;

30          informat D0012_P13_P01 $13. ;

31          informat D0013_P14_P01 $13. ;

     ......

1000         informat D0982_P21_P14 best32. ;

1001         informat D0983_P22_P14 best32. ;

1002         informat D0984_P23_P14 best32. ;

1003         informat D0985_P24_P14 best32. ;

1004         informat D0986_P25_P14 best32. ;

1005         informat D0987_P26_P14 best32. ;

1006         informat D0988_P27_P14 best32. ;

1007         informat D0989_P28_P14 best32. ;

             ......

2356         informat D2338_P45_P38 best32. ;

2357         informat D2339_P46_P38 best32. ;

2358         informat D2340_P47_P38 best32. ;

2359         informat D2341_P best32. ;

2360         informat VAR2342 best32. ;

2361         informat VAR2343 best32. ;

2362         informat VAR2344 best32. ;

2363         informat VAR2345 best32. ;

2364         informat VAR2346 best32. ;

2365         informat VAR2347 best32. ;

2366         informat VAR2348 best32. ;

2367         informat VAR2349 best32. ;

             ......

3340         format D0001_P02_P01 $9. ;

3341         format D0002_P03_P01 $13. ;

3342         format D0003_P04_P01 $13. ;

3343         format D0004_P05_P01 $13. ;

3344         format D0005_P06_P01 $13. ;

3345         format D0006_P07_P01 $13. ;

3346         format D0007_P08_P01 $13. ;

3347         format D0008_P09_P01 $13. ;

3348         format D0009_P10_P01 $13. ;

3349         format D0010_P11_P01 $13. ;

3350         format D0011_P12_P01 $13. ;

3351         format D0012_P13_P01 $13. ;

3352         format D0013_P14_P01 $13. ;

3353         format D0014_P15_P01 $13. ;

3354         format D0015_P16_P01 $13. ;

3355         format D0016_P17_P01 $13. ;

             ......

4321         format D0982_P21_P14 best12. ;

4322         format D0983_P22_P14 best12. ;

4323         format D0984_P23_P14 best12. ;

4324         format D0985_P24_P14 best12. ;

4325         format D0986_P25_P14 best12. ;

4326         format D0987_P26_P14 best12. ;

4327         format D0988_P27_P14 best12. ;

4328         format D0989_P28_P14 best12. ;

4329         format D0990_P29_P14 best12. ;

4330         format D0991_P30_P14 best12. ;

             ......

5677         format D2338_P45_P38 best12. ;

5678         format D2339_P46_P38 best12. ;

5679         format D2340_P47_P38 best12. ;

5680         format D2341_P best12. ;

5681         format VAR2342 best12. ;

5682         format VAR2343 best12. ;

5683         format VAR2344 best12. ;

5684         format VAR2345 best12. ;

5685         format VAR2346 best12. ;

5686         format VAR2347 best12. ;

5687         format VAR2348 best12. ;

5688         format VAR2349 best12. ;

             ......

6661      input

6662                  D0001_P02_P01 $

6663                  D0002_P03_P01 $

6664                  D0003_P04_P01 $

6665                  D0004_P05_P01 $

6666                  D0005_P06_P01 $

6667                  D0006_P07_P01 $

6668                  D0007_P08_P01 $

6669                  D0008_P09_P01 $

6670                  D0009_P10_P01 $

6671                  D0010_P11_P01 $

                      ......

8996                  D2335_P42_P38

8997                  D2336_P43_P38

8998                  D2337_P44_P38

8999                  D2338_P45_P38

9000                  D2339_P46_P38

9001                  D2340_P47_P38

9002                  D2341_P

9003                  VAR2342

9004                  VAR2343

9005                  VAR2344

9006                  VAR2345

9007                  VAR2346

9008                  VAR2347

9009                  VAR2348

9010                  VAR2349

                      ......

9983      ;

9984      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

9985      run;

NOTE: The infile 'C:\Users\Desktop\sas\18F.txt' is:

      Filename=C:\Users\Desktop\sas\18F.txt,

      RECFM=V,LRECL=32767,File Size (bytes)=76383,

      Last Modified=27Oct2014:15:12:47,

      Create Time=24Oct2014:15:54:03

NOTE: 1 record was read from the infile 'C:\Users\Desktop\sas\18F.txt'.

      The minimum record length was 29888.

      The maximum record length was 29888.

NOTE: The data set SASUSER.DISTANCE has 1 observations and 3321 variables.

NOTE: DATA statement used (Total process time):

      real time           7.35 seconds

      cpu time            7.03 seconds

1 rows created in SASUSER.DISTANCE                          from C:\Users\Desktop\sas\18F.txt.

NOTE: SASUSER.DISTANCE data set was successfully created.

NOTE: PROCEDURE IMPORT used (Total process time):

      real time           2:07.74

      cpu time            2:05.86

Respected Advisor
Posts: 3,777

Re: Part of the variable names changed when I tried to load a txt file

You will have to have a program purpose build essentially the same as what PROC IMPORT does but using  all the names. 

The data you posted looked to be all numeric variables but the log you posted showed some character variables.  If they are all numeric that will make it easier.

Respected Advisor
Posts: 3,777

Re: Part of the variable names changed when I tried to load a txt file

This works on the data you posted. 3321 variables.  You will need to change the path in the FILENAME statement.

filename FT54F001 '~/18f.txt' lrecl=49590;
data names;
   infile FT54F001 dsd dlm='09'x obs=1;
  
input name :$32. @@;
   retain one 1;
  
run;
proc transpose data=names out=names2(drop=_name_);
   var one;
   id name;
   run;
data _18f;
   if 0 then set names2;
   infile FT54F001 dsd dlm='09'x firstobs=2;
  
input (_all_)(Smiley Happy;
   run;
proc contents varnum;
  
run;
Contributor
Posts: 26

Re: Part of the variable names changed when I tried to load a txt file

Thank you so much. I tried your code and it did work!

Can you tell me what does "lrecl=49590" mean? Is it the length for a single line? The file I attached is just my test sample. Actually I have a .txt file including around 11 million variables (only one is charater variable, and others are numeric) and 360 oberservations. Do you think SAS can load it successfully?

Thank you!

Respected Advisor
Posts: 3,777

Re: Part of the variable names changed when I tried to load a txt file

LRECL is the Logical record length although that logical part does not apply to files with delimited records.

The file you posted 49590 was the length of the name row.  You can use "any" value as long as it is greater than or equal to the longest record.  80000 would work in your case too.

Eleven million variables (fields)? Where are you getting this data?

For the example data I think this is what it really is an 82x82 matrix;

data _18f;
   infile FT54F001 dsd dlm='09'x firstobs=2 termstr=crlf;
   length i 8;
  
array c[82];
   do i = 1 to dim(c);
      do j = i+1 to dim(c);
         input c @;
         end;
     
output;
     
call missing(of c
  • );
  •       end;
      
    drop j;
       run;

    So I don't think reading the data into variables as we did yesterday is necessary.  What are the dimensions of the matrix with 11 million variables?

    Contributor
    Posts: 26

    Re: Part of the variable names changed when I tried to load a txt file

    Yes, you are right. The .txt file I posted comes from a 82*82 matrix. But I don't need the upper matrix and the diagonal elements, that's why the file only contains 3321 variables. The final file I will use comes from a 3321*3321 matrix, but I won't use the diagonal elements, so it will have 11025720 variables.

    Respected Advisor
    Posts: 3,777

    Re: Part of the variable names changed when I tried to load a txt file

    Which P field in the name Row and Column Dxxx_Pnn_Pnn?

    Contributor
    Posts: 26

    Re: Part of the variable names changed when I tried to load a txt file

    I am not sure I get your question. Here is how I understand:

    In the example file I posed,  there are 3321 variables coming from a 82*82 matrix (except the upper matrix and the diagonals). The format of all the 3321 variable names is DXXXX_PYY_PZZ. XXXX is the index for the variables (1~3321), YY is the Row index from the 82*82 matrix, and ZZ is the Column index from the matrix.

    In my final file, there will be 11025720 variables coming from a 3321*3321 matrix (except the diagonals). The format of all the 11025720 variable names will be RXXXXXXXX_DYYYY_DZZZZ.  XXXXXXXX is the index for the variables (1~11025720), YYYY is the Row index from the 3321*3321 matrix, and ZZZZ is the Column index from the matrix.

    Thank you so much!

    Respected Advisor
    Posts: 3,777

    Re: Part of the variable names changed when I tried to load a txt file

    One of the Pnn is the column and the other the row.  But I'll leave that to you.

    This program reads the name field and extracts the meta data from the names the reads the matrix elements one element stored in one observations.  You can make it into a matrix as you like, the tall format is more useful than the wide we made yesterday.  And it should work for 11 million.

    filename FT54F001 '~/18f.txt' lrecl=49590;
    data names;
       infile FT54F001 dsd dlm='09'x obs=1 termstr=CRLF end=eof;
       input name :$32. @@;
       d = input(scan(name,1,'D_'),8.);
       p1 = input(scan(name,-2,'_P'),8.);
       p2 = input(scan(name,-1,'_P'),8.);
       run;

    data _81f_alt;
       infile FT54F001 dsd dlm='09'x firstobs=2 termstr=crlf;
       set names(keep=d p1 p2);
       input x @@;
       run;
    Contributor
    Posts: 26

    Re: Part of the variable names changed when I tried to load a txt file

    I run your code. The first part got all the names and the D,P,P values. However, the second part seems wrong.

    Here is the log:

    199  data _81f_alt;

    200     infile FT54F001 dsd dlm='09'x firstobs=2 termstr=crlf;

    201     set names(keep=d p1 p2);

    202     input x @@;

    203     run;

    NOTE: The infile FT54F001 is:

          Filename=C:\Users\Desktop\sas\18F.txt,

          RECFM=V,LRECL=49590,File Size (bytes)=46493,

          Last Modified=28Oct2014:16:48:02,

          Create Time=24Oct2014:15:54:03

    NOTE: 0 records were read from the infile FT54F001.

    NOTE: There were 1 observations read from the data set WORK.NAMES.

    NOTE: The data set WORK._81F_ALT has 0 observations and 4 variables.

    NOTE: DATA statement used (Total process time):

          real time           0.01 seconds

          cpu time            0.01 seconds

    Respected Advisor
    Posts: 3,777

    Re: Part of the variable names changed when I tried to load a txt file

    Your version of 18F.txt does not have 2 records.  Perhaps it is due to TERMSTR.  Omit that option and see what happens.

    Contributor
    Posts: 26

    Re: Part of the variable names changed when I tried to load a txt file

    Hi, let us go back to the code you posted yesterday.

    You created both "name" and "name2". What is the purpose of name2? I tried to delete name2, but the result is wrong. Here is the code:

    filename FT54F001 'C:\Users\Desktop\sas\18F.txt' lrecl=250000000;

    data names;

       infile FT54F001 dsd dlm='09'x obs=1;

       input name :$32. @@;

       retain one 1;

       run;

    data _18f;

       infile FT54F001 dsd dlm='09'x firstobs=2;

       input (_all_)(Smiley Happy;

       run;

    proc contents varnum;

       run;

    Where is the mistake? I am not quite undertsand the SAS code. Thank you!

    Respected Advisor
    Posts: 3,777

    Re: Part of the variable names changed when I tried to load a txt file

    The program I posted yesterday reads the names into observations and then uses PROC TRANSPOSE to create a data set from those names.

    You left that part out.  Go back an look at the code again.

    But you don't want that going forward use the code I suggested today and create the tall data set that you can make into the matrix you may need or perhaps you can use it like it is.  You haven't mention what happens after you get the data into SAS.

    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 39 replies
    • 1068 views
    • 7 likes
    • 5 in conversation