BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
parkourtofu
Fluorite | Level 6

hi everyone,

i have a huge data set in csv. that has 5mil lines with 332 columns

i have 2 things i want to do with it

1) i want to zeroise the 2 date columns from 99999999 to 0

2) save to csv after step 1.

 

Hence first i use SAS enterprise Guide 7.1 (64-bits) to import the data.

1) here i have made it into 1 column (for processing later) by choosing delimiter as "+" instead of the real delimiter "|" in the file.

2) i use TRANWRD function to replace 2 of my columns from 99999999 to 0

3) then i use the export function to save into csv.

 

some how the result has a ( " ) appearing after 8685th record as shown below?

parkourtofu_0-1591970975926.png

 

what is the reasons causing that and how can i fix it?

 

Thank you everyone in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You can do that in one data step:

data _null_;
infile "path to infile" lrecl=32767;
file "path to outfile" lrecl=32767;
input;
_infile_ = tranwrd(_infile_,'99999999','00000000');
put _infile_;
run;

View solution in original post

9 REPLIES 9
Kurt_Bremser
Super User

You can do that in one data step:

data _null_;
infile "path to infile" lrecl=32767;
file "path to outfile" lrecl=32767;
input;
_infile_ = tranwrd(_infile_,'99999999','00000000');
put _infile_;
run;
parkourtofu
Fluorite | Level 6
hi i have followed your code but the " still appear in the record.
i wrote the following:
data _null_;
infile "C:\SAS\have.csv" lrecl=32767;
file "C:\SAS\want.csv" lrecl=32767;
input;
_infile_ = tranwrd (_infile_, '99999999','0'); put _infile_; run;
Kurt_Bremser
Super User

@parkourtofu wrote:
hi i have followed your code but the " still appear in the record.
i wrote the following:
data _null_;
infile "C:\SAS\have.csv" lrecl=32767;
file "C:\SAS\want.csv" lrecl=32767;
input;
_infile_ = tranwrd (_infile_, '99999999','0'); put _infile_; run;

Then the quotes are already there in the input.

Inspect the respective lines in the file with a text editor (NOT WITH ANY OTHER SOFTWARE!).

ballardw
Super User

Without 1) starting data and 2) the actual code used it is hard to tell what you may have done.

 

This is about a clear as mud:

1) i want to zeroise the 2 date columns from 99999999 to 0

And this is also sort of confusing:

1) here i have made it into 1 column (for processing later) by choosing delimiter as "+" instead of the real delimiter "|" in the file.

2) i use TRANWRD function to replace 2 of my columns from 99999999 to 0

Did you apply Tranwrd to 2 columns or one?

 

And is the PICTURE (not very good for sharing actual content) from a spreadsheet program? They are know to convert some types of values when opening a CSV file and that can change the actual value. If you did not let a spread sheet save the data file, I suggest opening the file with a Text file editor like Notepad, Wordpad or even the SAS editor, go to the lines where the values change (if they actually do in the text file), copy a couple rows before and after, the paste them into a code box opened on the forum with </> icon. This last is important because the message windows on the forum will reformat text and may confuse the issue.

It would also be a good idea to copy the generated code you used to transform the variables and paste that in into a code box as well.

 

Is the column where the " appears involving your "date columns" at all? From the values shown, I do not see anything with a date related value (unless possibly you are using Julian dates).

Since 99999999 is not any where a typical date value it isn't clear what you start with. Is the value a character or numeric variable?

 

parkourtofu
Fluorite | Level 6

hi sorry i wasn't being clear enough

 

for Starting Data. seen as picture 1 below.

thumbnail_image007.png

as for the mentioned step 1. i have use SAS Enterprise Guide "Import Data" function to import the data into SAS server directly without codes. due to the file is too large n have many columns i decided to make all of them into 1 column. the method used will be as follows to make all data into a single column for the intention to "faster processing".

DATA WORK.ActILP_EX_2020_05_19_0030;
    LENGTH
        'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n $ 1884 ;
    LABEL
        'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n = "CHDRNUM|CNTTYPE|BILLFREQ|BILLCHNL|SRCEBUS|AGNTNUM|HISSDTE|BNSIND|ZREVBNS|BONUSAMT|ZINTRATE|ZANNTYDT|STATCODE|PSTATCODE|CRTABLE|LIFE|JLIFE|COVERAGE|RIDER|CRRCD|PCESTRM|RCESTRM|BCESTRM|SUMINS|ORIGSUM|ZBINSTPREM|EXTR|ZDISCODE|ZDISPREM|ZGSTPREM|AGE|RETIREAGE|C" ;
    FORMAT
        'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n $CHAR1884. ;
    INFORMAT
        'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n $CHAR1884. ;
    INFILE 'C:\Users\parkourtofu\AppData\Local\Temp\SEG12248\ActILP_EX_2020-05-19-0030-17e38c4b19c0442bbe027eb0ec1ba9d7.txt'
        LRECL=1884
        ENCODING="WLATIN1"
        TERMSTR=CRLF
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n : $CHAR1884. ;
RUN;

then i use tranwrd to change all the values with 99999999 to zero

data test;
set WORK.ActILP_EX_2020_05_19_0030;
'CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n=tranwrd('CHDRNUM|CNTTYPE|BILLFREQ|BILLCHN'n , "99999999|", "0|");
RUN;

 

 

image003.pngimage002.png

these are the date columns of the 332 columns inside my data i wanted to zeorise them.

yes the number that has " isnt the date and when i make them into 1 column, i assume i have made all the data without delimiter into a character form.

Tom
Super User Tom
Super User

Note that if you use the Enterprise Guide import wizard to select a CSV file from your PC and send it to the SAS server for conversion into a SAS dataset it will probably have made some changes to the CSV file in the process.

You might get more control by using the Copy Files task in EG to just upload the CSV file and write SAS code to read the file.

parkourtofu
Fluorite | Level 6
iam unable to find the "Copy Files" in my EG
Patrick
Opal | Level 21

Below code changing your .csv directly without reading the data into a SAS table won't perform as good as when using tranwrd(). What it does though is ensure that you only change 99999999 in the selected columns.

In the regular expression used ^(([^\|]*\|){2})(9{8}\|) the number in red represents the number of columns before the one where you want to change the string.

/* create sample file */
filename mycsv temp;
data _null_;
  file mycsv;
  put 'abc|def|99999999|1234|99999999|xyx';
  put '99999999|def|78654||99999999|xyx|99999999|abc';
  stop;
run;

/* change 99999999 to 0 if in columns 3 or 5 */
data _null_;
  file print;
  infile mycsv lrecl=32767;
  input;
  put 'Before: ' _infile_;
  _infile_= prxchange('s/^(([^\|]*\|){2})(9{8}\|)/${1}0|/oi', 1, trim(_infile_));
  _infile_= prxchange('s/^(([^\|]*\|){4})(9{8}\|)/${1}0|/oi', 1, trim(_infile_));
  put 'After:  ' _infile_;
run;

Patrick_0-1592019032428.png

 

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1519 views
  • 6 likes
  • 5 in conversation