Hi SAS Forum,
I am having the attached “.txt” file that should be converted to a SAS data set.
It has following 8 variables. And has 5 observations..
Category
SurName
First_Name
salary_1
salary_2
company
Profession
Year
Q:
When I imported using "Import Data" functionality in SAS, first 3 records get correctly imported.
Howerver, a part of values for "profession" variable for last 3 records get running into the next variable, which is "Income".
I have attahced an Excel table to show the wrong and correct answers.
Could you help me to correctly import the ".txt" file.
Thanks
Mirisa
Hi Forum again,
This is the txt file needs to be converted to a SAS data set.
Mirisa
With the below code i was able to generate the dataset want. The delimiter need to be used is tab like delimiter='09'x
the only exemption i observed is that the last three observations do not have a year, the people held multiple professions. So had to slightly update the txt file as the multiple professions were separated by tab instead i updated to a single space. due to which now i get year blank. Hope this is what you are expecting.
proc import datafile='~path\post_2017_Jul_15.txt' out=want
dbms=dlm replace;
delimiter='09'x;
getnames=yes;
run;
What options did you use when importing the text?
It appears that you text may be tab delimited text. And that you have tabs beford "Member (Part-time) / Membre (à temps partiel)" and "Director of Clinical Training" which why those appear under the column heading "Year".
It helps to talk about the column labels as appearing in the data NOT a concept like "Income" which does not appear in the data
If you used the wizard to import the data the log should contain data step code used to read the file. Post that along with any messages from from log. Please use a code box opened with the forum {i} menu icon to preserve formatting of the code as the message windows will do some text reformatting.
/* -------------------------------------------------------------------- Code generated by a SAS task Generated on Sunday, July 16, 2017 at 10:37:09 AM By task: Import Data Wizard Source file: xxx:\post_2017_Jul_15.txt Server: Local File System Output data: WORK.post_2017_Jul_151 Server: Note: In preparation for running the following code, the Import Data wizard has used internal routines to transfer the source data file from the local file system to xxxxxxx. There is no SAS code available to represent this action. -------------------------------------------------------------------- */ DATA WORK.post_2017_Jul_151; LENGTH Category $ 14 SurName $ 6 First_Name $ 7 salary_1 8 salary_2 8 company $ 25 Profession $ 63 Year $ 45 ; FORMAT Category $CHAR14. SurName $CHAR6. First_Name $CHAR7. salary_1 DOLLAR12.2 salary_2 DOLLAR7.2 company $CHAR25. Profession $CHAR63. Year $CHAR45. ; INFORMAT Category $CHAR14. SurName $CHAR6. First_Name $CHAR7. salary_1 DOLLAR12. salary_2 DOLLAR7. company $CHAR25. Profession $CHAR63. Year $CHAR45. ; INFILE 'T:\SASWork\_xxxxxxxxxxxx_\#LN00038' LRECL=181 ENCODING="WLATIN1" TERMSTR=CRLF DLM='7F'x MISSOVER DSD ; INPUT Category : $CHAR14. SurName : $CHAR6. First_Name : $CHAR7. salary_1 : ?? DOLLAR12. salary_2 : ?? DOLLAR7. company : $CHAR25. Profession : $CHAR63. Year : $CHAR45. ; RUN;
Hi Jag, LinusH, Ballrdw,
Thanks to each one of you. Please see the log attahced. Also attahced screenshots to show how I have imported.
Hi Jag,
Actually last three observations should not have a year, which is what I want.
I think you have manually updated the txt file to a single space to get the correct answer (as the multiple professions were separated by tab instead you updated to a single space).
However, my original file is having over million records, so I cannot update it to a single space manually.
Could you help me.
Not sure if you want to cleanup the non-ascii characters in Profession, or what the maximum length of that field is, but the following will let you read all of the fields and let you adjust as necessary:
data want (drop=csalary_1 position length);
infile '/folders/myfolders/post_2017_Jul_15.txt' dlm='09'x lrecl=32767 firstobs=2;
input @;
if not anyalpha(scan(_infile_,-1,'09'x)) then do;
year=input(scan(_infile_,-1,'09'x),4.);
CALL SCAN(_infile_, -1, position, length, '09'x);
_infile_=substr(_infile_,1,position-2);
end;
informat
Category $14.
SurName $20.
First_Name $20.
csalary_1 $14.
salary_2 DOLLAR7.
company $25.
;
length profession $512;
input Category SurName First_Name csalary_1 salary_2 company@;
salary_1=input(compress(csalary_1,'"'),dollar12.);
CALL SCAN(_infile_, 6, position, length, '09'x);
_infile_=translate(substr(_infile_,position+length+1),'/','09'x);
profession=_infile_;
input;
run;
Art, CEO, AnalystFinder.com
Thanks Art for your code.
It works correctly for the sample dataset I have attahced.
However, I was not able to adjust it to my big dataset (but it is my problem).
Thanks
Mirisa
In order for anyone to be able to help you would have to post a couple of the lines that didn't import correctly.
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.