- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to import a big txt file (more than 160,000 kb). There are 3 variables in the dataset, as below. seperated by '|'
member_code|ABX_filled_date|ABX_filled_code
70544241234567|2016-12-06|00456141030
70544241234568|2016-12-15|00456141030
70544241234569|2016-12-15|60505265605
I run this code but then SAS automatically changed my first variable member_code to scientific notation and I am not able to figure out the true value of it
proc import datafile ='My file.txt'
out =drug_expo
dbms=dlm
replace;
delimiter='|';
run;
My SAS data set looks like
member_code ABX filled_date ABX_filled_code
7.0544243E13 2016-12-06 00456141030
7.0544243E13 2016-12-15 00456141030
7.0544243E13 2016-12-15 60505265605
Anyone has any suggestion what I should do to prevent this from happening?
Thank you so much
Yijun
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
2. Run PROC IMPORT with GUESSINGROWS=MAX. Check the log. Take the code from the log and adjust your formats/informats as appropriate and re-run to get the desired data.
Most likely it's just a formatting issue, but a number that large should be read as a character field to avoid any precision issues.
data want;
infile 'path to file' dlm='|' dsd truncover;
length member_code_ABC $25. abx_filled_code $20.;
informat filled_date yymmdd10.;
format filled_date yymmdd10.;
input member_code_ABX filled_Date abx_filled_code;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@LisaYIN9309 wrote:
Hello,
I am trying to import a big txt file (more than 160,000 kb). There are 3 variables in the dataset, as below. seperated by '|'
member_code|ABX_filled_date|ABX_filled_code
70544241234567|2016-12-06|00456141030
70544241234568|2016-12-15|00456141030
70544241234569|2016-12-15|60505265605
I run this code but then SAS automatically changed my first variable member_code to scientific notation and I am not able to figure out the true value of it
proc import datafile ='My file.txt'
out =drug_expo
dbms=dlm
replace;
delimiter='|';
run;
My SAS data set looks like
member_code ABX filled_date ABX_filled_code
7.0544243E13 2016-12-06 00456141030
7.0544243E13 2016-12-15 00456141030
7.0544243E13 2016-12-15 60505265605
Anyone has any suggestion what I should do to prevent this from happening?
Thank you so much
Yijun
Probably something named "member_code" should not be numeric at all. If this is a person identifier are you going to do arithmetic with it? If not then it should be character. At which point PROC IMPORT is guessing numeric only because the values have digits.
And defaulting to a best12 format for display.
Rerun you proc import. The log will have data step code generated to read the data.
Copy that from the log into the editor.
There is likely a line like: informat member_code best32.; Change the best32. to $15. Remove the likely FORMAT member_code best12. ;
SAVE the program.
Rerun the data step.
The result would have member code as a 15 character value. If any of the values are longer than 15 characters change the informat to match (or just be paranoid and make it $20. to start allowing later longer values).
Your ABX_filled_code variable may want similar treatment if the leading 00 actually mean something such as part of an drug identification code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why are you using PROC IMPORT to read a text file? Especially if there are only three variables.
data want;
infile 'myfile.txt' dsd truncover dlm='|' firstobs=2;
input member_code :$15. ABX_filled_date :yymmdd. ABX_filled_code :$10.;
format ABX_filled_date yymmdd10. ;
run;