Hello
I want to export big data set of 250 million rows and 30 columns into TXT ZIP file.
I used this code
libname LS '/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/';
options missing='';
filename outfile
zip "/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip"
member="lightSolver_sofi.txt";
data _null_;
file outfile dlm=',';
set LS.lightSolver_sofi;
if _n_ = 1 then put
'MEZAHE_CARD,
date_TAARICH_BITZUA_ISKA,
date_TAARICH_IBUD_ISKA,
date_TAARICH_CHIUV,
date_CALC_TRCH,
MEZAHE_TNUA,
MLI_TEUR_SUG_CARD,
KOD_CHEVRA,
TEUR_CARD,
KOD_CHEVRA_GLOBAL,
CHEVRA,
SHEM_SOCHER,
MIS_SOCHER,
MCC4,
SHEM_MCC4,
COMPANY_LEGAL_ID,
SUG_ISKA,
TEURSUG_ISKA,
SCHUM_CHIUV,
MIS_TASH_NOCH,
MATBEA_CHIUV,
MIS_TASH_MEK,
MIS_TASH_NOTRU,
SCHUM_ISKA_MAT_MAKOR,
MATBEA_BITZUA_ISKA,
ACHUZ_RIBIT,
IND_ISKAT_CHUL,
SCHUM_ISKA_MAT_CHUV,
IND_HOCK,
KOD_SOLECK,
IND_ISKA_BERUR,
IND_ISKA_MISMACH_CHASER,
IND_ISKA_MEULETZET,
IND_HALVAA,
SUG_PLAN,
TEUR_SUG_TOCHNIT,
MAHUT_KARTIS,
RIBIT_NOTAR,
**bleep**AT_PERAON,
SUG_RIBIT,
SHIUR_RIBIT,
MIRVACH_PRIME,
COUNTRY_CODE,
WhichFile,
Date_HITZTARFUT,
month_Nechonut,
LAKOACH,
groupk_base,
misg_crd_base,
Hativa_base,
nr_FollowUpMonths,
Ind_Touch11,
nr_Months_With_Tnua_CC,
Ind_HAve_ISkaCC_BaseMon,
BITZUA_ISKA_TIME,
Ind_Sinun_out,
MEZAHE_PART
';
put
MEZAHE_CARD
date_TAARICH_BITZUA_ISKA
date_TAARICH_IBUD_ISKA
date_TAARICH_CHIUV
date_CALC_TRCH
MEZAHE_TNUA
MLI_TEUR_SUG_CARD
KOD_CHEVRA
TEUR_CARD
KOD_CHEVRA_GLOBAL
CHEVRA
SHEM_SOCHER
MIS_SOCHER
MCC4
SHEM_MCC4
COMPANY_LEGAL_ID
SUG_ISKA
TEURSUG_ISKA
SCHUM_CHIUV
MIS_TASH_NOCH
MATBEA_CHIUV
MIS_TASH_MEK
MIS_TASH_NOTRU
SCHUM_ISKA_MAT_MAKOR
MATBEA_BITZUA_ISKA
ACHUZ_RIBIT
IND_ISKAT_CHUL
SCHUM_ISKA_MAT_CHUV
IND_HOCK
KOD_SOLECK
IND_ISKA_BERUR
IND_ISKA_MISMACH_CHASER
IND_ISKA_MEULETZET
IND_HALVAA
SUG_PLAN
TEUR_SUG_TOCHNIT
MAHUT_KARTIS
RIBIT_NOTAR
**bleep**AT_PERAON
SUG_RIBIT
SHIUR_RIBIT
MIRVACH_PRIME
COUNTRY_CODE
WhichFile
Date_HITZTARFUT
month_Nechonut
LAKOACH
groupk_base
misg_crd_base
Hativa_base
nr_FollowUpMonths
Ind_Touch11
nr_Months_With_Tnua_CC
Ind_HAve_ISkaCC_BaseMon
BITZUA_ISKA_TIME
Ind_Sinun_out
MEZAHE_PART
;
run;
options missing='.';
Now I want to Import this file into SAS data set in order to see that it was exported well.
I used this code and it produce wrong results.
The values look Gibrish and only have 8 columns and only 56 million rows.
Why???
PROC IMPORT OUT=newdata
DATAFILE="/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip"
DBMS=CSV
REPLACE;
GETNAMES=YES;
RUN;
I would expect something interesting in the log of the export. Having
filename outfile clear;
after the data step could be useful, too.
Thanks,
Do you want me to run this code?
and then copy the log message?
filename outfile clear;
PROC IMPORT OUT=newdata
DATAFILE="/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip"
DBMS=CSV
REPLACE;
GETNAMES=YES;
RUN;
Here is the log
1 The SAS System 07:52 Monday, October 9, 2023
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Program';
4 %LET _CLIENTPROCESSFLOWNAME='Process Flow';
5 %LET _CLIENTPROJECTPATH='';
6 %LET _CLIENTPROJECTPATHHOST='';
7 %LET _CLIENTPROJECTNAME='';
8 %LET _SASPROGRAMFILE='';
9 %LET _SASPROGRAMFILEHOST='';
10
11 ODS _ALL_ CLOSE;
12 OPTIONS DEV=PNG;
13 GOPTIONS XPIXELS=0 YPIXELS=0;
14 FILENAME EGSR TEMP;
15 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
16 STYLE=HTMLBlue
17 STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css")
18 NOGTITLE
19 NOGFOOTNOTE
20 GPATH=&sasworklocation
21 ENCODING=UTF8
22 options(rolap="on")
23 ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24
25 GOPTIONS ACCESSIBLE;
26
27 filename outfile clear;
NOTE: Fileref OUTFILE has been deassigned.
28 PROC IMPORT OUT=newdata
29 DATAFILE="/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip"
30 DBMS=CSV
31 REPLACE;
32 GETNAMES=YES;
33 RUN;
Number of names found is less than number of variables found.
Name
PK EIW lightSolver_sofi.txt ה½wq/ת~^zq/w2i´$ו$¾9±JגsoC G#Fn
{wןימט¦\¦÷z PS»>~ץ«ל³ח‾?}צץ/Vפ»רכW}ש§½שח‾~ם³
truncated to PK EIW li.
Problems were detected with provided names. See LOG.
34 /**********************************************************************
35 * PRODUCT: SAS
36 * VERSION: 9.4
37 * CREATOR: External File Interface
38 * DATE: 09OCT23
39 * DESC: Generated SAS Datastep Code
40 * TEMPLATE SOURCE: (None Specified.)
41 ***********************************************************************/
42 data WORK.NEWDATA ;
43 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
44 infile '/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip' delimiter = ',' MISSOVER DSD
44 ! lrecl=32767 firstobs=2 ;
45 informat "PKEIWli"N $1029. ;
46 informat VAR2 $252. ;
47 informat VAR3 $183. ;
48 informat VAR4 $169. ;
2 The SAS System 07:52 Monday, October 9, 2023
49 informat VAR5 $149. ;
50 informat VAR6 $8. ;
51 informat VAR7 $36. ;
52 informat VAR8 $2. ;
53 format "PKEIWli"N $1029. ;
54 format VAR2 $252. ;
55 format VAR3 $183. ;
56 format VAR4 $169. ;
57 format VAR5 $149. ;
58 format VAR6 $8. ;
59 format VAR7 $36. ;
60 format VAR8 $2. ;
61 input
62 "PKEIWli"N $
63 VAR2 $
64 VAR3 $
65 VAR4 $
66 VAR5 $
67 VAR6 $
68 VAR7 $
69 VAR8 $
70 ;
71 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
72 run;
NOTE: The infile '/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip' is:
Filename=/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip,
Owner Name=Udclk79,
Group Name=GR_IL_SAS_USERS_LabRet,
Access Permission=-rwxrwxr-x,
Last Modified=09Oct2023:09:59:46,
File Size (bytes)=14630108879
NOTE: 56523337 records were read from the infile '/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip'.
The minimum record length was 0.
The maximum record length was 4633.
NOTE: The data set WORK.NEWDATA has 56523337 observations and 8 variables.
NOTE: Compressing data set WORK.NEWDATA decreased size by 86.23 percent.
Compressed is 222361 pages; un-compressed would require 1614953 pages.
NOTE: DATA statement used (Total process time):
real time 2:18.95
user cpu time 1:58.60
system cpu time 14.08 seconds
memory 9701.07k
OS Memory 33556.00k
Timestamp 10/09/2023 01:43:35 PM
Step Count 62 Switch Count 2
Page Faults 0
Page Reclaims 4739
Page Swaps 0
Voluntary Context Switches 65365
Involuntary Context Switches 1160
Block Input Operations 0
Block Output Operations 0
56523337 rows created in WORK.NEWDATA from /usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip.
3 The SAS System 07:52 Monday, October 9, 2023
NOTE: WORK.NEWDATA data set was successfully created.
NOTE: The data set WORK.NEWDATA has 56523337 observations and 8 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 2:19.45
user cpu time 1:58.64
system cpu time 14.11 seconds
memory 9701.07k
OS Memory 34072.00k
Timestamp 10/09/2023 01:43:35 PM
Step Count 62 Switch Count 12
Page Faults 0
Page Reclaims 7803
Page Swaps 0
Voluntary Context Switches 65597
Involuntary Context Switches 1161
Block Input Operations 0
Block Output Operations 0
73
74
75 GOPTIONS NOACCESSIBLE;
76 %LET _CLIENTTASKLABEL=;
77 %LET _CLIENTPROCESSFLOWNAME=;
78 %LET _CLIENTPROJECTPATH=;
79 %LET _CLIENTPROJECTPATHHOST=;
80 %LET _CLIENTPROJECTNAME=;
81 %LET _SASPROGRAMFILE=;
82 %LET _SASPROGRAMFILEHOST=;
83
84 ;*';*";*/;quit;run;
85 ODS _ALL_ CLOSE;
86
87
88 QUIT; RUN;
89
May you please show the code?
This it the code format that I made the export from sas data set into CSV file.
libname LS '/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/';
options missing='';
filename outfile
zip "/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip"
member="lightSolver_sofi.txt";
data _null_;
file outfile dlm=',';
set LS.lightSolver_sofi;
if _n_ = 1 then put
'Var_Names with comma between them';
put
Var_Names
;
run;
options missing='.';
You just use the same file reference you used in the export in the following import. No rocket science.
Can you show code example please?
You omit the FILENAME CLEAR statement and use
infile outfile dlm="," firstobs=2 dsd truncover;
in the DATA step with which you read the file back.
To make a proper delimited file you need to use the DSD option, not just change the delimiter. Otherwise SAS will not write the null values and values that contain the delimiter properly.
First make a fileref that points to the member in the ZIP file you want write to (or read from).
filename SOFI zip
"/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip"
member="lightSolver_sofi.txt"
;
Then you can write to the file (no need to mess with the OPTIONS settings):
libname LS '/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/';
data _null_;
file SOFI dsd ;
if _n_ = 1 then put 'Var_Names with comma between them';
set LS.lightSolver_sofi;
put Var_Names ;
run;
Or read from the file:
libname LS '/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/';
data LS.lightSolver_sofi;
infile SOFI dsd truncover firstobs=2;
* Define the variables ;
length .... ;
* read the variables ;
input Var_Names ;
run;
If you really want to force SAS to make guesses about how to read the file you can try using PROC IMPORT, but you will first have to copy the file to an actual file.
filename copy temp;
data _null_;
infile SOFI;
file copy;
input;
put _infile_;
run;
proc import file=copy dbms=csv out=want replace;
run;
If you want you can use some other tool to guess how to import the file that does not require you to copy the file out of the ZIP file first.
Try using %CSV2DS() macro https://github.com/sasutils/macros/blob/master/csv2ds.sas
%csv2ds(SOFI,out=want,replace=1)
And you will get other advantages.
Differences from PROC IMPORT - Supports header lines with more than 32,767 characters - Supports ZIP and GZIP source files - Generates unique variable names by adding numeric suffix - Does not overestimate maxlength when longest value is quoted - Does NOT force character type if all values are quoted - Generates label when generated variable name is different than header - Supports NAMEROW option - Supports numeric fields with special missing values (MISSING statement) - Does not attach unneeded informats or formats - Allows overriding calculated metadata - Allow using random sample of rows to guess metadata - Generates more compact SAS code - Generates analysis summary dataset and raw data view - Saves generated SAS code to a file - Forces DATE and DATETIME formats to show century - Difference in generated V7 compatible variable names - Replaces adjacent non-valid characters with single underscore
Thanks so much Tom.
I applied your code and here is the Log.
I have some questions please:
1-There is a note in Log "NOTE: Format SINUN_REASON_FMT was not found or could not be loaded."
I don't see any variable with this name SINUN_REASON_FMT.
Why this note came from? Any action is required from me about it?
2-In Order to Import the ZIP file (In order to check that it was exported correctly) I need to use following statements:
Input (List of variables in same order they appear in the ZIP file)
Length
I am looking in the log in order to know what length to apply for each var.
How can I know it?
I dont see it in the log (Log of the export activity)
3- What type of file was created? Is it called CSV ZIP file?
is the end of the file is file_name.ZIP or file_name.ZIP.CSV or file_name.CSV?
1 The SAS System 14:08 Monday, October 9, 2023
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Export_final_DatSet_into_CSV_File';
4 %LET _CLIENTPROCESSFLOWNAME='Process Flow';
5 %LET _CLIENTPROJECTPATH='';
6 %LET _CLIENTPROJECTPATHHOST='';
7 %LET _CLIENTPROJECTNAME='';
8 %LET _SASPROGRAMFILE='M:\a.ריכוז
8 ! תוכניות\CS\LightSolver\EXPORT_SAS_DATA_SET_INTO_CSV_FILE\Export_final_DatSet_into_CSV_File.sas';
9 %LET _SASPROGRAMFILEHOST='VSK1H0103A2021';
10
11 ODS _ALL_ CLOSE;
12 OPTIONS DEV=PNG;
13 GOPTIONS XPIXELS=0 YPIXELS=0;
14 FILENAME EGSR TEMP;
15 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
16 STYLE=HTMLBlue
17 STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css")
18 NOGTITLE
19 NOGFOOTNOTE
20 GPATH=&sasworklocation
21 ENCODING=UTF8
22 options(rolap="on")
23 ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24
25 GOPTIONS ACCESSIBLE;
26 libname LS '/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/';
NOTE: Libref LS was successfully assigned as follows:
Engine: V9
Physical Name: /usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards
27 data _null_;
28 set LS.lightSolver_sofi;
29 file SOFI dsd ;
30 if _n_=1 then
31 put
32 'MEZAHE_CARD,
33 date_TAARICH_BITZUA_ISKA,
34 date_TAARICH_IBUD_ISKA,
35 date_TAARICH_CHIUV,
36 date_CALC_TRCH,
37 MEZAHE_TNUA,
38 MLI_TEUR_SUG_CARD,
39 KOD_CHEVRA,
40 TEUR_CARD,
41 KOD_CHEVRA_GLOBAL,
42 CHEVRA,
43 SHEM_SOCHER,
44 MIS_SOCHER,
45 MCC4,
46 SHEM_MCC4,
47 COMPANY_LEGAL_ID,
48 SUG_ISKA,
49 TEURSUG_ISKA,
50 SCHUM_CHIUV,
51 MIS_TASH_NOCH,
NOTE: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
2 The SAS System 14:08 Monday, October 9, 2023
52 MATBEA_CHIUV,
53 MIS_TASH_MEK,
54 MIS_TASH_NOTRU,
55 SCHUM_ISKA_MAT_MAKOR,
56 MATBEA_BITZUA_ISKA,
57 ACHUZ_RIBIT,
58 IND_ISKAT_CHUL,
59 SCHUM_ISKA_MAT_CHUV,
60 IND_HOCK,
61 KOD_SOLECK,
62 IND_ISKA_BERUR,
63 IND_ISKA_MISMACH_CHASER,
64 IND_ISKA_MEULETZET,
65 IND_HALVAA,
66 SUG_PLAN,
67 TEUR_SUG_TOCHNIT,
68 MAHUT_KARTIS,
69 RIBIT_NOTAR,
70 **bleep**AT_PERAON,
71 SUG_RIBIT,
72 SHIUR_RIBIT,
73 MIRVACH_PRIME,
74 COUNTRY_CODE,
75 WhichFile,
76 Date_HITZTARFUT,
77 month_Nechonut,
78 LAKOACH,
79 groupk_base,
80 misg_crd_base,
81 Hativa_base,
82 nr_FollowUpMonths,
83 Ind_Touch11,
84 nr_Months_With_Tnua_CC,
85 Ind_HAve_ISkaCC_BaseMon,
86 BITZUA_ISKA_TIME,
87 Ind_Sinun_out,
88 MEZAHE_PART
89 ';
90 put
91 MEZAHE_CARD
92 date_TAARICH_BITZUA_ISKA
93 date_TAARICH_IBUD_ISKA
94 date_TAARICH_CHIUV
95 date_CALC_TRCH
96 MEZAHE_TNUA
97 MLI_TEUR_SUG_CARD
98 KOD_CHEVRA
99 TEUR_CARD
100 KOD_CHEVRA_GLOBAL
101 CHEVRA
102 SHEM_SOCHER
103 MIS_SOCHER
104 MCC4
105 SHEM_MCC4
106 COMPANY_LEGAL_ID
107 SUG_ISKA
108 TEURSUG_ISKA
109 SCHUM_CHIUV
3 The SAS System 14:08 Monday, October 9, 2023
110 MIS_TASH_NOCH
111 MATBEA_CHIUV
112 MIS_TASH_MEK
113 MIS_TASH_NOTRU
114 SCHUM_ISKA_MAT_MAKOR
115 MATBEA_BITZUA_ISKA
116 ACHUZ_RIBIT
117 IND_ISKAT_CHUL
118 SCHUM_ISKA_MAT_CHUV
119 IND_HOCK
120 KOD_SOLECK
121 IND_ISKA_BERUR
122 IND_ISKA_MISMACH_CHASER
123 IND_ISKA_MEULETZET
124 IND_HALVAA
125 SUG_PLAN
126 TEUR_SUG_TOCHNIT
127 MAHUT_KARTIS
128 RIBIT_NOTAR
129 **bleep**AT_PERAON
130 SUG_RIBIT
131 SHIUR_RIBIT
132 MIRVACH_PRIME
133 COUNTRY_CODE
134 WhichFile
135 Date_HITZTARFUT
136 month_Nechonut
137 LAKOACH
138 groupk_base
139 misg_crd_base
140 Hativa_base
141 nr_FollowUpMonths
142 Ind_Touch11
143 nr_Months_With_Tnua_CC
144 Ind_HAve_ISkaCC_BaseMon
145 BITZUA_ISKA_TIME
146 Ind_Sinun_out
147 MEZAHE_PART
148 ;
149 run;
NOTE: Format SINUN_REASON_FMT was not found or could not be loaded.
NOTE: Format SINUN_REASON_FMT was not found or could not be loaded.
NOTE: The file SOFI is:
Filename=/usr/local/SAS/SASUsers/LabRet/Adhoc/creditCards/lightSolver_sofi.zip,
Member Name=lightSolver_sofi.txt
NOTE: 287603021 records were written to the file SOFI.
The minimum record length was 278.
The maximum record length was 810.
NOTE: There were 287603020 observations read from the data set LS.LIGHTSOLVER_SOFI.
NOTE: DATA statement used (Total process time):
real time 1:28:15.18
user cpu time 1:27:03.14
system cpu time 1:00.15
memory 1804.53k
OS Memory 28584.00k
Timestamp 10/09/2023 09:11:00 PM
4 The SAS System 14:08 Monday, October 9, 2023
Step Count 13 Switch Count 8475
Page Faults 0
Page Reclaims 59463
Page Swaps 0
Voluntary Context Switches 83036
Involuntary Context Switches 26775
Block Input Operations 0
Block Output Operations 0
150
151 GOPTIONS NOACCESSIBLE;
152 %LET _CLIENTTASKLABEL=;
153 %LET _CLIENTPROCESSFLOWNAME=;
154 %LET _CLIENTPROJECTPATH=;
155 %LET _CLIENTPROJECTPATHHOST=;
156 %LET _CLIENTPROJECTNAME=;
157 %LET _SASPROGRAMFILE=;
158 %LET _SASPROGRAMFILEHOST=;
159
160 ;*';*";*/;quit;run;
161 ODS _ALL_ CLOSE;
162
163
164 QUIT; RUN;
165
1. It means you have a format of that name assigned to a variable, but the format cannot be found in the path given in the FMTSEARCH system option. You must create it anew, or add the location to FMTSEARCH.
So needed action is to create proc format?
After create proc format than the value in ZIP CSV file will be the formatted value?
And If I don't write proc format then the values in ZIP CSV will be the non-formatted values?
@Ronein wrote:
So needed action is to create proc format?
After create proc format than the value in ZIP CSV file will be the formatted value?
And If I don't write proc format then the values in ZIP CSV will be the non-formatted values?
Depends on what you want to put into the CSV file. Do you want the values are stored in the dataset? or the values that SAS would display if it could find the format?
Other question about import the ZP CSV file.
Should I type after INPUT word the list of all variables?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.