BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
radhames_gomez
Calcite | Level 5

Hello,

 

I'm reading a file from a remote server using SFTP.  The connection and everything is working well, but the file has some characters like

ñž0 0 .   if i manually download the file to my PC and use then option ignoredoseof  the dataset load all the 41k records., but this options is not available when the connection is made through SFTP and only load 4k records.  What can i do to read all the records from the sftp.

radhames_gomez_0-1689877593196.png

 

	filename ss sftp "/ftpdata/sss/&file_to_read2" host="ftpserver.myserver.local" user="asssa"  optionsx='-i "E:\SAS WORK\mykey.ppk" -pw "ffssfww" '  ;

	data order;
		infile ss
		DLM='^'
		missover
		truncover
		
		dsd;
F1               : $CHAR6.
        F2               : $CHAR38.
        F3               : $CHAR38.
        F4               : $CHAR5.
        F5               : $CHAR2.
        F6               : ?? MMDDYY8.
        F7               : $CHAR17.
        F8               : $CHAR6.
        F9               : ?? BEST5.
        F10              : $CHAR2.
        F11              : ?? BEST8.
        F12              : ?? BEST8.
        F13              : $CHAR2.
        F14              : ?? BEST5.
        F15              : ?? MMDDYY8.
        F16              : $CHAR34.
        F17              : $CHAR1.
        F18              : $CHAR4.
        F19              : $CHAR6.
        F20              : ?? BEST4.
        F21              : ?? BEST8.
        F22              : ?? BEST7.
        F23              : ?? BEST9.
        F24              : $CHAR4.
        F25              : $CHAR4.
        F26              : ?? BEST4.
        F27              : ?? BEST8.
        F28              : $CHAR1.
        F29              : $CHAR1.
        F30              : $CHAR8.
        F31              : $CHAR3.
        F32              : ?? BEST8.
        F33              : ?? BEST8.
        F34              : ?? BEST10.
        F35              : ?? BEST5.
        F36              : $CHAR10.
        F37              : $CHAR10.
        F38              : $CHAR10.
        F39              : $CHAR10.
        F40              : $CHAR10.
        F41              : $CHAR10.
        F42              : $CHAR10.
        F43              : $CHAR10.
        F44              : $CHAR10.
        F45              : $CHAR10.
        F46              : $CHAR10.
        F47              : $CHAR10.
        F48              : $CHAR10.
        F49              : $CHAR10.
        F50              : $CHAR10.
        F51              : $CHAR10.
        F52              : $CHAR10.
        F53              : $CHAR10.
        F54              : $CHAR10.
        F55              : $CHAR10.
        F56              : $CHAR10.
        F57              : $CHAR10.
        F58              : $CHAR10.
        F59              : $CHAR10.
        F60              : $CHAR10.
        F61              : $CHAR10.
        F62              : $CHAR10.
        F63              : $CHAR10.
        F64              : $CHAR10.
        F65              : $CHAR10.
        F66              : $CHAR10.
        F67              : $CHAR10.
        F68              : $CHAR10.
        F69              : $CHAR10.
        F70              : $CHAR10.
        F71              : $CHAR10.
        F72              : $CHAR10.
        F73              : $CHAR10.
        F74              : $CHAR10.
        F75              : $CHAR10.
        F76              : $CHAR10.
        F77              : $CHAR8.
        F78              : $CHAR8.
        F79              : $CHAR10.
        F80              : $CHAR1.
        F81              : $CHAR3.
        F82              : ?? BEST7.
        F83              : $CHAR34.
        F84              : $CHAR34.
        F85              : $CHAR9.
        F86              : ?? BEST4.
        F87              : ?? MMDDYY8.
        F88              : ?? BEST7.
        F89              : ?? BEST6.
        F90              : ?? BEST8.
        F91              : $CHAR10.
        F92              : $CHAR1.
        F93              : $CHAR21.
        F94              : $CHAR28.
        F95              : $CHAR8.
        F96              : $CHAR16.
        F97              : $CHAR8.
        F98              : $CHAR13.
        F99              : $CHAR2.
        F100             : ?? BEST9.
        F101             : ?? BEST8.
        F102             : ?? BEST8.
        F103             : ?? BEST8.
        F104             : ?? BEST5.
        F105             : $CHAR1.
        F106             : $CHAR2.
        F107             : $CHAR2.
        F108             : $CHAR15.
        F109             : $CHAR1.
        F110             : $CHAR10.
        F111             : ?? BEST5.
        F112             : $CHAR6.
        F113             : ?? BEST3.
        F114             : ?? BEST7.
        F115             : ?? BEST7.
        F116             : $CHAR1.
        F117             : ?? BEST3.
        F118             : $CHAR32767.
        F119             : $CHAR16.
        F120             : $CHAR6.
        F121             : $CHAR8.
        F122             : $CHAR8.
        F123             : $CHAR10.
        F124             : $CHAR10.
        F125             : $CHAR9.
        F126             : $CHAR9.
        F127             : $CHAR10.
        F128             : $CHAR10.
        F129             : $CHAR10.
        F130             : $CHAR10.
        F131             : $CHAR16.
        F132             : $CHAR50.
        F133             : ?? BEST10.
        F134             : $CHAR15.
        F135             : $CHAR1. ;
	run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

To read the file you have to read the file.  I doubt it matters how you do it.

 

But the point of copying the file as BINARY (using either RECFM=F or RECFM=N) is probably important, and hopefully will solve your problem.

 

You can do it with FCOPY() or a separate data step.  For example using RECFM=N you would do:

filename in sftp ...... recfm=n;
filename out 'Some local file name' recfm=n;
data _null_;
  rc=fcopy('in','out');
  put rc=;
run;

Or you could do it using fixed length records.  For example with a data step:

filename in sftp ...... recfm=f lrecl=256;;
filename out 'Some local file name' recfm=f lrecl=256;
data _null_;
  infile in;
  file out;
  input;
  put _infile_;
run;

 

 

Now that you have the file you can try looking at the file and see if it actually looks like a TEXT file you are trying to read it as.  For example you might just try reading it and checking how many lines it has and the range of line lengths.

data check;
  infile 'Some local file name'  length=linelen ;
  input;
  row+1;
  length=linelen;
run;

proc mean univariate data=check;
   var length;
   id row;
run;

Does that yield more rows than your previous attempt?  Does it change if you add the IGNOREDOSEOF option to the INFILE statement?

 

You could also try looking at some of the bytes in the file and see if it really is made of lines.  And what character is used to mark the end of the lines by reading it as FIXED length.  So you could check the first 5,000 bytes of the file by doing this:

data _null_;
  infile 'Some local file name'  recfm=f lrecl=100 obs=50 ;
  input;
  list;
run;

You should also check the encoding of the file and compare it to the encoding of your current SAS session.  If the file is using UTF-8 encoding and your SAS session is using some single byte encoding, such as LATIN1 or WLATIN1, then there might be characters in the file that cannot be transcoded in the characters in the encoding being used by your SAS session.  That might explain the strange characters.

View solution in original post

9 REPLIES 9
ballardw
Super User

It may help to provide the LOG from an instance when you don't get all of the lines.

 

You apparently edited the code an lost at least part of the INPUT statement (i.e. the word INPUT).

 

When you look at the resulting data do you see any pattern in the "missing" lines? Do your data observation match, as far as you can tell the first lines of the source file? Or do you actually get random observations?

 

Foreign language characters shouldn't be a problem unless they could be confused with an end of line marker.

 

Tom
Super User Tom
Super User

What encoding did SAS use for the downloaded file?

What happens when you add IGNOREDOSEOF to the INFILE statement that is reading from the SS fileref that is pointing directly to the SFTP stream?  Does it not work? Does it generate an error message? What message?

 

Why not just download the file first using SAS code and then read from the the resulting physical file?

filename ss sftp .....;
filename copy temp;
%put rc=%sysfunc(fcopy(ss,copy));
....
  infile copy ignoredoseof .... ;
radhames_gomez
Calcite | Level 5

i tried to download the file but i only brings me 4,145 rows and the log is totally clean, no warning no message.

 

i try this lines of code 

 

filename copy temp;

%put rc= %sysfunc(fcopy(ss,copy));

data DATA_IN;
  infile COPY  truncover
 		DLM='^'
		ignoredoseof
		dsd;
   INPUT
        OS               : $CHAR6.
        LAD              : $CHAR38.
        NAME             : $CHAR38.
        CS               : $CHAR5.
        MAC              : $CHAR2.
        TD               : ?? MMDDYY8.
        ST               : $CHAR15.
        SUF              : $CHAR6.
        TOWN             : ?? BEST5.
        DIST             : $CHAR2.
        DD               : ?? MMDDYY8.
        CD               : $CHAR8.
        COMP             : $CHAR2.
        INST             : $CHAR5.
        CUCF1            : ?? MMDDYY8.
        RMK3             : $CHAR34.
        RUTA             : $CHAR1.
        LTER             : $CHAR4.
        LCABLE           : $CHAR6.
        LPAIR            : $CHAR4.
        LCOUNT           : ?? BEST8.
        TN               : $CHAR7.
        LEN              : $CHAR9.
        HTER             : $CHAR4.
        HCABLE           : $CHAR4.
        HPAIR            : $CHAR4.
        HCOUNT           : $CHAR8.
        TIPO             : $CHAR1.
        DISP_IND         : $CHAR1.
        CITA_DT          : $CHAR8.
        CF_FLAG          : $CHAR3.
        RENT_AMT         : ?? BEST8.
        NRC_AMT          : ?? BEST8.
        REF_TN           : $CHAR10.
        SLC              : $CHAR5.
        SERT0            : $CHAR10.
        SERT1            : $CHAR10.
        SERT2            : $CHAR10.
        SERT3            : $CHAR10.
        SERT4            : $CHAR10.
        SERT5            : $CHAR10.
        SERT6            : $CHAR10.
        SERT7            : $CHAR10.
        SERT8            : $CHAR10.
        SERT9            : $CHAR10.
        SERT10           : $CHAR10.
        SERT11           : $CHAR10.
        SERT12           : $CHAR10.
        SERT13           : $CHAR10.
        SERT14           : $CHAR10.
        SERT15           : $CHAR10.
        SERT16           : $CHAR10.
        SERT17           : $CHAR9.
        SERT18           : $CHAR10.
        SERT19           : $CHAR10.
        SERT20           : $CHAR8.
        SERT21           : $CHAR8.
        SERT22           : $CHAR9.
        SERT23           : $CHAR6.
        SERT24           : $CHAR10.
        SERT25           : $CHAR9.
        SERT26           : $CHAR7.
        SERT27           : $CHAR1.
        SERT28           : $CHAR1.
        SERT29           : $CHAR1.
        SERT30           : $CHAR1.
        SERT31           : $CHAR1.
        SERT32           : $CHAR1.
        SERT33           : $CHAR1.
        SERT34           : $CHAR1.
        SERT35           : $CHAR1.
        SERT36           : $CHAR1.
        SERT37           : $CHAR1.
        SERT38           : $CHAR1.
        SERT39           : $CHAR1.
        SERT40           : $CHAR1.
        SERT41           : $CHAR1.
        SERT42           : $CHAR1.
        SERT43           : $CHAR1.
        DISPR            : $CHAR5.
        REASON           : $CHAR3.
        DUE_DATE_TYPE    : $CHAR1.
        WW               : $CHAR2.
        MODEM_BRAND      : $CHAR13.
        ADJ_DATE         : $CHAR8.
        EMPLOYEE         : $CHAR5.
        QT0              : $CHAR5.
        QT1              : $CHAR5.
        QT2              : $CHAR5.
        QT3              : $CHAR5.
        QT4              : $CHAR5.
        QT5              : $CHAR5.
        QT6              : $CHAR5.
        QT7              : $CHAR5.
        QT8              : $CHAR5.
        QT9              : $CHAR5.
        QT10             : $CHAR5.
        QT11             : $CHAR5.
        QT12             : $CHAR5.
        QT13             : $CHAR5.
        QT14             : $CHAR5.
        QT15             : $CHAR5.
        QT16             : $CHAR5.
        QT17             : $CHAR5.
        QT18             : $CHAR5.
        QT19             : $CHAR5.
        QT20             : $CHAR5.
        QT21             : $CHAR5.
        QT22             : $CHAR5.
        QT23             : $CHAR5.
        QT24             : $CHAR5.
        QT25             : $CHAR5.
        QT26             : $CHAR5.
        QT27             : $CHAR5.
        QT28             : $CHAR5.
        QT29             : $CHAR5.
        QT30             : $CHAR5.
        QT31             : $CHAR5.
        QT32             : $CHAR5.
        QT33             : $CHAR5.
        QT34             : $CHAR5.
        QT35             : $CHAR5.
        QT36             : $CHAR5.
        QT37             : $CHAR5.
        QT38             : $CHAR5.
        QT39             : $CHAR5.
        QT40             : $CHAR5.
        QT41             : $CHAR5.
        QT42             : $CHAR5.
        QT43             : $CHAR5.
        QT44             : $CHAR1.
        OLD_RESELLER_ID  : $CHAR7.
        NEW_RESELLER_ID  : $CHAR7.
        SYST             : $CHAR6.
        CANAL            : $CHAR3.
        CHANNEL          : $CHAR10.
        NPA              : $CHAR3.
        FROM             : $CHAR6.
        TO               : $CHAR6.
        BAN              : ?? BEST9.
        DROP_SIZE        : $CHAR1.
        DAY_TEL_NUM      : $CHAR10.
        EVE_TEL_NUM      : $CHAR10.
        CELL_TEL_NUM     : $CHAR10.
        OWNER_EMAIL      : $CHAR30.
        CIRCUIT_ID       : $CHAR10.
        F                : $CHAR1.
        SERIAL_NUMBER    : $CHAR15.
        OTHER            : $CHAR1. ;
run;

and here is the log

 

27 data DATA_IN;
28 infile COPY truncover
29 DLM='^'
30 ignoredoseof
31 dsd;
32 INPUT
33 OS : $CHAR6.
34 LAD : $CHAR38.
35 NAME : $CHAR38.
36 CS : $CHAR5.
37 MAC : $CHAR2.
38 TD : ?? MMDDYY8.
39 ST : $CHAR15.
40 SUF : $CHAR6.
41 TOWN : ?? BEST5.
42 DIST : $CHAR2.
43 DD : ?? MMDDYY8.
44 CD : $CHAR8.
45 COMP : $CHAR2.
46 INST : $CHAR5.
47 CUCF1 : ?? MMDDYY8.
48 RMK3 : $CHAR34.
49 RUTA : $CHAR1.
50 LTER : $CHAR4.
51 LCABLE : $CHAR6.
52 LPAIR : $CHAR4.
53 LCOUNT : ?? BEST8.
54 TN : $CHAR7.
55 LEN : $CHAR9.
56 HTER : $CHAR4.
57 HCABLE : $CHAR4.
2 The SAS System 09:17 Thursday, July 20, 2023

58 HPAIR : $CHAR4.
59 HCOUNT : $CHAR8.
60 TIPO : $CHAR1.
61 DISP_IND : $CHAR1.
62 CITA_DT : $CHAR8.
63 CF_FLAG : $CHAR3.
64 RENT_AMT : ?? BEST8.
65 NRC_AMT : ?? BEST8.
66 REF_TN : $CHAR10.
67 SLC : $CHAR5.
68 SERT0 : $CHAR10.
69 SERT1 : $CHAR10.
70 SERT2 : $CHAR10.
71 SERT3 : $CHAR10.
72 SERT4 : $CHAR10.
73 SERT5 : $CHAR10.
74 SERT6 : $CHAR10.
75 SERT7 : $CHAR10.
76 SERT8 : $CHAR10.
77 SERT9 : $CHAR10.
78 SERT10 : $CHAR10.
79 SERT11 : $CHAR10.
80 SERT12 : $CHAR10.
81 SERT13 : $CHAR10.
82 SERT14 : $CHAR10.
83 SERT15 : $CHAR10.
84 SERT16 : $CHAR10.
85 SERT17 : $CHAR9.
86 SERT18 : $CHAR10.
87 SERT19 : $CHAR10.
88 SERT20 : $CHAR8.
89 SERT21 : $CHAR8.
90 SERT22 : $CHAR9.
91 SERT23 : $CHAR6.
92 SERT24 : $CHAR10.
93 SERT25 : $CHAR9.
94 SERT26 : $CHAR7.
95 SERT27 : $CHAR1.
96 SERT28 : $CHAR1.
97 SERT29 : $CHAR1.
98 SERT30 : $CHAR1.
99 SERT31 : $CHAR1.
100 SERT32 : $CHAR1.
101 SERT33 : $CHAR1.
102 SERT34 : $CHAR1.
103 SERT35 : $CHAR1.
104 SERT36 : $CHAR1.
105 SERT37 : $CHAR1.
106 SERT38 : $CHAR1.
107 SERT39 : $CHAR1.
108 SERT40 : $CHAR1.
109 SERT41 : $CHAR1.
110 SERT42 : $CHAR1.
111 SERT43 : $CHAR1.
112 DISPR : $CHAR5.
113 REASON : $CHAR3.
114 DUE_DATE_TYPE : $CHAR1.
115 WW : $CHAR2.
3 The SAS System 09:17 Thursday, July 20, 2023

116 MODEM_BRAND : $CHAR13.
117 ADJ_DATE : $CHAR8.
118 EMPLOYEE : $CHAR5.
119 QT0 : $CHAR5.
120 QT1 : $CHAR5.
121 QT2 : $CHAR5.
122 QT3 : $CHAR5.
123 QT4 : $CHAR5.
124 QT5 : $CHAR5.
125 QT6 : $CHAR5.
126 QT7 : $CHAR5.
127 QT8 : $CHAR5.
128 QT9 : $CHAR5.
129 QT10 : $CHAR5.
130 QT11 : $CHAR5.
131 QT12 : $CHAR5.
132 QT13 : $CHAR5.
133 QT14 : $CHAR5.
134 QT15 : $CHAR5.
135 QT16 : $CHAR5.
136 QT17 : $CHAR5.
137 QT18 : $CHAR5.
138 QT19 : $CHAR5.
139 QT20 : $CHAR5.
140 QT21 : $CHAR5.
141 QT22 : $CHAR5.
142 QT23 : $CHAR5.
143 QT24 : $CHAR5.
144 QT25 : $CHAR5.
145 QT26 : $CHAR5.
146 QT27 : $CHAR5.
147 QT28 : $CHAR5.
148 QT29 : $CHAR5.
149 QT30 : $CHAR5.
150 QT31 : $CHAR5.
151 QT32 : $CHAR5.
152 QT33 : $CHAR5.
153 QT34 : $CHAR5.
154 QT35 : $CHAR5.
155 QT36 : $CHAR5.
156 QT37 : $CHAR5.
157 QT38 : $CHAR5.
158 QT39 : $CHAR5.
159 QT40 : $CHAR5.
160 QT41 : $CHAR5.
161 QT42 : $CHAR5.
162 QT43 : $CHAR5.
163 QT44 : $CHAR1.
164 OLD_RESELLER_ID : $CHAR7.
165 NEW_RESELLER_ID : $CHAR7.
166 SYST : $CHAR6.
167 CANAL : $CHAR3.
168 CHANNEL : $CHAR10.
169 NPA : $CHAR3.
170 FROM : $CHAR6.
171 TO : $CHAR6.
172 BAN : ?? BEST9.
173 DROP_SIZE : $CHAR1.
4 The SAS System 09:17 Thursday, July 20, 2023

174 DAY_TEL_NUM : $CHAR10.
175 EVE_TEL_NUM : $CHAR10.
176 CELL_TEL_NUM : $CHAR10.
177 OWNER_EMAIL : $CHAR30.
178 CIRCUIT_ID : $CHAR10.
179 F : $CHAR1.
180 SERIAL_NUMBER : $CHAR15.
181 OTHER : $CHAR1. ;
182 run;

NOTE: The infile COPY is:
Filename=E:\SAS WORK\TempFiles\_TD37536_PRT-4G9Q1S2_\#LN00221,
RECFM=V,LRECL=32767,File Size (bytes)=5859902,
Last Modified=20Jul2023:15:50:18,
Create Time=20Jul2023:15:50:18

NOTE: 4145 records were read from the infile COPY.
The minimum record length was 284.
The maximum record length was 1412.
NOTE: The data set WORK.DATA_IN has 4145 observations and 149 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds

 

 

Tom
Super User Tom
Super User

Try downloading the file as fixed length records by adding RECFM=F to the SS fileref.

Do you get more than 5,859,902 bytes in the resulting file?

radhames_gomez
Calcite | Level 5

I tried with the RECFM=F and the results is the same. 

 

There is a way that i could just simple get the file and put in a directory and windows, without using the data step.

 

 

 

Tom
Super User Tom
Super User

To read the file you have to read the file.  I doubt it matters how you do it.

 

But the point of copying the file as BINARY (using either RECFM=F or RECFM=N) is probably important, and hopefully will solve your problem.

 

You can do it with FCOPY() or a separate data step.  For example using RECFM=N you would do:

filename in sftp ...... recfm=n;
filename out 'Some local file name' recfm=n;
data _null_;
  rc=fcopy('in','out');
  put rc=;
run;

Or you could do it using fixed length records.  For example with a data step:

filename in sftp ...... recfm=f lrecl=256;;
filename out 'Some local file name' recfm=f lrecl=256;
data _null_;
  infile in;
  file out;
  input;
  put _infile_;
run;

 

 

Now that you have the file you can try looking at the file and see if it actually looks like a TEXT file you are trying to read it as.  For example you might just try reading it and checking how many lines it has and the range of line lengths.

data check;
  infile 'Some local file name'  length=linelen ;
  input;
  row+1;
  length=linelen;
run;

proc mean univariate data=check;
   var length;
   id row;
run;

Does that yield more rows than your previous attempt?  Does it change if you add the IGNOREDOSEOF option to the INFILE statement?

 

You could also try looking at some of the bytes in the file and see if it really is made of lines.  And what character is used to mark the end of the lines by reading it as FIXED length.  So you could check the first 5,000 bytes of the file by doing this:

data _null_;
  infile 'Some local file name'  recfm=f lrecl=100 obs=50 ;
  input;
  list;
run;

You should also check the encoding of the file and compare it to the encoding of your current SAS session.  If the file is using UTF-8 encoding and your SAS session is using some single byte encoding, such as LATIN1 or WLATIN1, then there might be characters in the file that cannot be transcoded in the characters in the encoding being used by your SAS session.  That might explain the strange characters.

radhames_gomez
Calcite | Level 5

This approach works fine.   Thanks a lot

 

filename in sftp ...... recfm=f lrecl=256;;
filename out 'Some local file name' recfm=f lrecl=256;
data _null_;
  infile in;
  file out;
  input;
  put _infile_;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 704 views
  • 0 likes
  • 4 in conversation