Hello all,
I am totally a new user. I tried to load a .txt file into SAS 9.2. It only has 2 lines and 3321 columns. The first line is the variable names (the format is DXXXX_PXX_PXX), and the second line is the variable value.
After I loaded it, it showed that part of the varabile names changed. From No. 2342, the variable name became VARXXXX. SAS log said "Number of names found is less than number of variables found." Is it because the length of the line is too much for SAS? What should I do to solve it? I don't want to cut the file into several ones and merge them together.
I attached the .txt file. Thank you so much for your kind help!
Elaine
This is the program I used to read the data you posted here:
http://dropbox.unl.edu/uploads/20141112/2cb2cc0a5d5e994b/18Fratio.txt
I just ignored the field named "name" it's value is 18F. I had to increase memory with SAS invovation option "-memsize '12g' I don't know if that is too much but it worked, my system is UNIX. I ran the program in batch. You have to be careful that you don't make SAS "dump" the input record to the LOG that's what all those ?? marks are about. If SAS dumps those giant records the log is unusable. I don't know if this is the format you want but it is the only one that makes sense to me.
The LRECL for the name row is 49590. I think if you are using PROC IMPORT the LRECL defaults to 32767. The data row is less that that limit at 29904.
Assuming PROC IMPORT use a FILEREF to pass the file info.
FILENAME MYFILE 'path' lrecl=49590.
proc import datafile=MYFILE;
I don't know if that will work or not. Show your SAS LOG.
Thank you for your reply. Yes, I used PROC IMPORT to load the file. And you are right, the maximum length is 32767 characters. From 32768, the varible name changed to VARXXXX.
Sorry I don't know any code in SAS right now. All I want to do now is trying to load my file into SAS and make sure it shows correctly. What should I do?
Here is the log.
1 PROC IMPORT OUT= SASUSER.DISTANCE
2 DATAFILE= "C:\Users\Desktop\sas\18F.txt"
4 DBMS=TAB REPLACE;
5 GETNAMES=YES;
6 DATAROW=2;
7 RUN;
Number of names found is less than number of variables found.
Number of names found is less than number of variables found.
8 /**********************************************************************
9 * PRODUCT: SAS
10 * VERSION: 9.2
11 * CREATOR: External File Interface
12 * DATE: 27OCT14
13 * DESC: Generated SAS Datastep Code
14 * TEMPLATE SOURCE: (None Specified.)
15 ***********************************************************************/
16 data SASUSER.DISTANCE ;
17 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
18 infile 'C:\Users\CRYSTAL.Crystal-PC\Desktop\meeting today\sas\18F.txt' delimiter='09'x MISSOVER DSD
18 ! lrecl=32767 firstobs=2 ;
19 informat D0001_P02_P01 $9. ;
20 informat D0002_P03_P01 $13. ;
21 informat D0003_P04_P01 $13. ;
22 informat D0004_P05_P01 $13. ;
23 informat D0005_P06_P01 $13. ;
24 informat D0006_P07_P01 $13. ;
25 informat D0007_P08_P01 $13. ;
26 informat D0008_P09_P01 $13. ;
27 informat D0009_P10_P01 $13. ;
28 informat D0010_P11_P01 $13. ;
29 informat D0011_P12_P01 $13. ;
30 informat D0012_P13_P01 $13. ;
31 informat D0013_P14_P01 $13. ;
......
1000 informat D0982_P21_P14 best32. ;
1001 informat D0983_P22_P14 best32. ;
1002 informat D0984_P23_P14 best32. ;
1003 informat D0985_P24_P14 best32. ;
1004 informat D0986_P25_P14 best32. ;
1005 informat D0987_P26_P14 best32. ;
1006 informat D0988_P27_P14 best32. ;
1007 informat D0989_P28_P14 best32. ;
......
2356 informat D2338_P45_P38 best32. ;
2357 informat D2339_P46_P38 best32. ;
2358 informat D2340_P47_P38 best32. ;
2359 informat D2341_P best32. ;
2360 informat VAR2342 best32. ;
2361 informat VAR2343 best32. ;
2362 informat VAR2344 best32. ;
2363 informat VAR2345 best32. ;
2364 informat VAR2346 best32. ;
2365 informat VAR2347 best32. ;
2366 informat VAR2348 best32. ;
2367 informat VAR2349 best32. ;
......
3340 format D0001_P02_P01 $9. ;
3341 format D0002_P03_P01 $13. ;
3342 format D0003_P04_P01 $13. ;
3343 format D0004_P05_P01 $13. ;
3344 format D0005_P06_P01 $13. ;
3345 format D0006_P07_P01 $13. ;
3346 format D0007_P08_P01 $13. ;
3347 format D0008_P09_P01 $13. ;
3348 format D0009_P10_P01 $13. ;
3349 format D0010_P11_P01 $13. ;
3350 format D0011_P12_P01 $13. ;
3351 format D0012_P13_P01 $13. ;
3352 format D0013_P14_P01 $13. ;
3353 format D0014_P15_P01 $13. ;
3354 format D0015_P16_P01 $13. ;
3355 format D0016_P17_P01 $13. ;
......
4321 format D0982_P21_P14 best12. ;
4322 format D0983_P22_P14 best12. ;
4323 format D0984_P23_P14 best12. ;
4324 format D0985_P24_P14 best12. ;
4325 format D0986_P25_P14 best12. ;
4326 format D0987_P26_P14 best12. ;
4327 format D0988_P27_P14 best12. ;
4328 format D0989_P28_P14 best12. ;
4329 format D0990_P29_P14 best12. ;
4330 format D0991_P30_P14 best12. ;
......
5677 format D2338_P45_P38 best12. ;
5678 format D2339_P46_P38 best12. ;
5679 format D2340_P47_P38 best12. ;
5680 format D2341_P best12. ;
5681 format VAR2342 best12. ;
5682 format VAR2343 best12. ;
5683 format VAR2344 best12. ;
5684 format VAR2345 best12. ;
5685 format VAR2346 best12. ;
5686 format VAR2347 best12. ;
5687 format VAR2348 best12. ;
5688 format VAR2349 best12. ;
......
6661 input
6662 D0001_P02_P01 $
6663 D0002_P03_P01 $
6664 D0003_P04_P01 $
6665 D0004_P05_P01 $
6666 D0005_P06_P01 $
6667 D0006_P07_P01 $
6668 D0007_P08_P01 $
6669 D0008_P09_P01 $
6670 D0009_P10_P01 $
6671 D0010_P11_P01 $
......
8996 D2335_P42_P38
8997 D2336_P43_P38
8998 D2337_P44_P38
8999 D2338_P45_P38
9000 D2339_P46_P38
9001 D2340_P47_P38
9002 D2341_P
9003 VAR2342
9004 VAR2343
9005 VAR2344
9006 VAR2345
9007 VAR2346
9008 VAR2347
9009 VAR2348
9010 VAR2349
......
9983 ;
9984 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
9985 run;
NOTE: The infile 'C:\Users\Desktop\sas\18F.txt' is:
Filename=C:\Users\Desktop\sas\18F.txt,
RECFM=V,LRECL=32767,File Size (bytes)=76383,
Last Modified=27Oct2014:15:12:47,
Create Time=24Oct2014:15:54:03
NOTE: 1 record was read from the infile 'C:\Users\Desktop\sas\18F.txt'.
The minimum record length was 29888.
The maximum record length was 29888.
NOTE: The data set SASUSER.DISTANCE has 1 observations and 3321 variables.
NOTE: DATA statement used (Total process time):
real time 7.35 seconds
cpu time 7.03 seconds
1 rows created in SASUSER.DISTANCE from C:\Users\Desktop\sas\18F.txt.
NOTE: SASUSER.DISTANCE data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 2:07.74
cpu time 2:05.86
You will have to have a program purpose build essentially the same as what PROC IMPORT does but using all the names.
The data you posted looked to be all numeric variables but the log you posted showed some character variables. If they are all numeric that will make it easier.
This works on the data you posted. 3321 variables. You will need to change the path in the FILENAME statement.
Thank you so much. I tried your code and it did work!
Can you tell me what does "lrecl=49590" mean? Is it the length for a single line? The file I attached is just my test sample. Actually I have a .txt file including around 11 million variables (only one is charater variable, and others are numeric) and 360 oberservations. Do you think SAS can load it successfully?
Thank you!
LRECL is the Logical record length although that logical part does not apply to files with delimited records.
The file you posted 49590 was the length of the name row. You can use "any" value as long as it is greater than or equal to the longest record. 80000 would work in your case too.
Eleven million variables (fields)? Where are you getting this data?
For the example data I think this is what it really is an 82x82 matrix;
So I don't think reading the data into variables as we did yesterday is necessary. What are the dimensions of the matrix with 11 million variables?
Yes, you are right. The .txt file I posted comes from a 82*82 matrix. But I don't need the upper matrix and the diagonal elements, that's why the file only contains 3321 variables. The final file I will use comes from a 3321*3321 matrix, but I won't use the diagonal elements, so it will have 11025720 variables.
Which P field in the name Row and Column Dxxx_Pnn_Pnn?
I am not sure I get your question. Here is how I understand:
In the example file I posed, there are 3321 variables coming from a 82*82 matrix (except the upper matrix and the diagonals). The format of all the 3321 variable names is DXXXX_PYY_PZZ. XXXX is the index for the variables (1~3321), YY is the Row index from the 82*82 matrix, and ZZ is the Column index from the matrix.
In my final file, there will be 11025720 variables coming from a 3321*3321 matrix (except the diagonals). The format of all the 11025720 variable names will be RXXXXXXXX_DYYYY_DZZZZ. XXXXXXXX is the index for the variables (1~11025720), YYYY is the Row index from the 3321*3321 matrix, and ZZZZ is the Column index from the matrix.
Thank you so much!
One of the Pnn is the column and the other the row. But I'll leave that to you.
This program reads the name field and extracts the meta data from the names the reads the matrix elements one element stored in one observations. You can make it into a matrix as you like, the tall format is more useful than the wide we made yesterday. And it should work for 11 million.
I run your code. The first part got all the names and the D,P,P values. However, the second part seems wrong.
Here is the log:
199 data _81f_alt;
200 infile FT54F001 dsd dlm='09'x firstobs=2 termstr=crlf;
201 set names(keep=d p1 p2);
202 input x @@;
203 run;
NOTE: The infile FT54F001 is:
Filename=C:\Users\Desktop\sas\18F.txt,
RECFM=V,LRECL=49590,File Size (bytes)=46493,
Last Modified=28Oct2014:16:48:02,
Create Time=24Oct2014:15:54:03
NOTE: 0 records were read from the infile FT54F001.
NOTE: There were 1 observations read from the data set WORK.NAMES.
NOTE: The data set WORK._81F_ALT has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Your version of 18F.txt does not have 2 records. Perhaps it is due to TERMSTR. Omit that option and see what happens.
Hi, let us go back to the code you posted yesterday.
You created both "name" and "name2". What is the purpose of name2? I tried to delete name2, but the result is wrong. Here is the code:
filename FT54F001 'C:\Users\Desktop\sas\18F.txt' lrecl=250000000;
data names;
infile FT54F001 dsd dlm='09'x obs=1;
input name :$32. @@;
retain one 1;
run;
data _18f;
infile FT54F001 dsd dlm='09'x firstobs=2;
input (_all_)(:);
run;
proc contents varnum;
run;
Where is the mistake? I am not quite undertsand the SAS code. Thank you!
The program I posted yesterday reads the names into observations and then uses PROC TRANSPOSE to create a data set from those names.
You left that part out. Go back an look at the code again.
But you don't want that going forward use the code I suggested today and create the tall data set that you can make into the matrix you may need or perhaps you can use it like it is. You haven't mention what happens after you get the data into SAS.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.