hi everyone,
i have a huge data set in csv. that has 5mil lines with 332 columns
i have 2 things i want to do with it
1) i want to zeroise the 2 date columns from 99999999 to 0
2) save to csv after step 1.
Hence first i use SAS enterprise Guide 7.1 (64-bits) to import the data.
1) here i have made it into 1 column (for processing later) by choosing delimiter as "+" instead of the real delimiter "|" in the file.
2) i use TRANWRD function to replace 2 of my columns from 99999999 to 0
3) then i use the export function to save into csv.
some how the result has a ( " ) appearing after 8685th record as shown below?
what is the reasons causing that and how can i fix it?
Thank you everyone in advance.
You can do that in one data step:
data _null_;
infile "path to infile" lrecl=32767;
file "path to outfile" lrecl=32767;
input;
_infile_ = tranwrd(_infile_,'99999999','00000000');
put _infile_;
run;
You can do that in one data step:
data _null_;
infile "path to infile" lrecl=32767;
file "path to outfile" lrecl=32767;
input;
_infile_ = tranwrd(_infile_,'99999999','00000000');
put _infile_;
run;
@parkourtofu wrote:
hi i have followed your code but the " still appear in the record.
i wrote the following:
data _null_;
infile "C:\SAS\have.csv" lrecl=32767;
file "C:\SAS\want.csv" lrecl=32767;
input;
_infile_ = tranwrd (_infile_, '99999999','0'); put _infile_; run;
Then the quotes are already there in the input.
Inspect the respective lines in the file with a text editor (NOT WITH ANY OTHER SOFTWARE!).
Without 1) starting data and 2) the actual code used it is hard to tell what you may have done.
This is about a clear as mud:
1) i want to zeroise the 2 date columns from 99999999 to 0
And this is also sort of confusing:
1) here i have made it into 1 column (for processing later) by choosing delimiter as "+" instead of the real delimiter "|" in the file.
2) i use TRANWRD function to replace 2 of my columns from 99999999 to 0
Did you apply Tranwrd to 2 columns or one?
And is the PICTURE (not very good for sharing actual content) from a spreadsheet program? They are know to convert some types of values when opening a CSV file and that can change the actual value. If you did not let a spread sheet save the data file, I suggest opening the file with a Text file editor like Notepad, Wordpad or even the SAS editor, go to the lines where the values change (if they actually do in the text file), copy a couple rows before and after, the paste them into a code box opened on the forum with </> icon. This last is important because the message windows on the forum will reformat text and may confuse the issue.
It would also be a good idea to copy the generated code you used to transform the variables and paste that in into a code box as well.
Is the column where the " appears involving your "date columns" at all? From the values shown, I do not see anything with a date related value (unless possibly you are using Julian dates).
Since 99999999 is not any where a typical date value it isn't clear what you start with. Is the value a character or numeric variable?
hi sorry i wasn't being clear enough
for Starting Data. seen as picture 1 below.
as for the mentioned step 1. i have use SAS Enterprise Guide "Import Data" function to import the data into SAS server directly without codes. due to the file is too large n have many columns i decided to make all of them into 1 column. the method used will be as follows to make all data into a single column for the intention to "faster processing".
DATA WORK.ActILP_EX_2020_05_19_0030;
LENGTH
'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n $ 1884 ;
LABEL
'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n = "CHDRNUM|CNTTYPE|BILLFREQ|BILLCHNL|SRCEBUS|AGNTNUM|HISSDTE|BNSIND|ZREVBNS|BONUSAMT|ZINTRATE|ZANNTYDT|STATCODE|PSTATCODE|CRTABLE|LIFE|JLIFE|COVERAGE|RIDER|CRRCD|PCESTRM|RCESTRM|BCESTRM|SUMINS|ORIGSUM|ZBINSTPREM|EXTR|ZDISCODE|ZDISPREM|ZGSTPREM|AGE|RETIREAGE|C" ;
FORMAT
'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n $CHAR1884. ;
INFORMAT
'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n $CHAR1884. ;
INFILE 'C:\Users\parkourtofu\AppData\Local\Temp\SEG12248\ActILP_EX_2020-05-19-0030-17e38c4b19c0442bbe027eb0ec1ba9d7.txt'
LRECL=1884
ENCODING="WLATIN1"
TERMSTR=CRLF
DLM='7F'x
MISSOVER
DSD ;
INPUT
'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n : $CHAR1884. ;
RUN;
then i use tranwrd to change all the values with 99999999 to zero
data test;
set WORK.ActILP_EX_2020_05_19_0030;
'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n=tranwrd('CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n , "99999999|", "0|");
RUN;
these are the date columns of the 332 columns inside my data i wanted to zeorise them.
yes the number that has " isnt the date and when i make them into 1 column, i assume i have made all the data without delimiter into a character form.
Note that if you use the Enterprise Guide import wizard to select a CSV file from your PC and send it to the SAS server for conversion into a SAS dataset it will probably have made some changes to the CSV file in the process.
You might get more control by using the Copy Files task in EG to just upload the CSV file and write SAS code to read the file.
@parkourtofu wrote:
iam unable to find the "Copy Files" in my EG
Search: https://www.google.com/search?q=%40sas.com+enterprise+guide+copy+files+task
Read: https://blogs.sas.com/content/sasdummy/2020/05/19/copy-files-in-sas-eg/
Below code changing your .csv directly without reading the data into a SAS table won't perform as good as when using tranwrd(). What it does though is ensure that you only change 99999999 in the selected columns.
In the regular expression used ^(([^\|]*\|){2})(9{8}\|)
the number in red represents the number of columns before the one where you want to change the string.
/* create sample file */
filename mycsv temp;
data _null_;
file mycsv;
put 'abc|def|99999999|1234|99999999|xyx';
put '99999999|def|78654||99999999|xyx|99999999|abc';
stop;
run;
/* change 99999999 to 0 if in columns 3 or 5 */
data _null_;
file print;
infile mycsv lrecl=32767;
input;
put 'Before: ' _infile_;
_infile_= prxchange('s/^(([^\|]*\|){2})(9{8}\|)/${1}0|/oi', 1, trim(_infile_));
_infile_= prxchange('s/^(([^\|]*\|){4})(9{8}\|)/${1}0|/oi', 1, trim(_infile_));
put 'After: ' _infile_;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.