BookmarkSubscribeRSS Feed
Saurabh_Amar
Calcite | Level 5

Hi,

I have a flat file with ';' as a delimiter. The file is for a company at employee id level (unique row for each employee). Couple of variable (like education history) may have multiple values for each employee, which are delimited by '!' in my file. below is one example.

EMP_ID AGE NAME EDUCATION OCCUPATION

12345 ; 45 ; john smith ; "Doctorate Degree (over 19 years),5/1/2014 ! Masters's Degree (±18 years),1/1/2013 ! Bachelor's Degree (±16 years),1/1/2011" ; manager

23456 ; 30 ; smith john ; "Bachelor's Degree (±16 years),1/1/2003" ; analyst

for education field (which has ';' in starting and end) values in paranthesis indicate years in education and date represents when the degree was completed, separated by a comma.

there are other fields as well in the data but i have mentioned only few for example. I need to read the data in sas but a bit confused how to treat multiple delimiters in this case.

Thanks!

SA

3 REPLIES 3
gergely_batho
SAS Employee

You need to manually deal with the second delimiter. Using functions. Here is an example:

data want;

infile 'C:\temp\file.txt' dsd dlm=';';

input EMP_ID:$10. AGE NAME:$20. EDUCATION:$100. OCCUPATION:$10.;

do i=1 to countw(EDUCATION,'!');

  SingleEducation=scan(EDUCATION,i,'!');

  Degree=scan(SingleEducation,1,'(),');

  Years=scan(SingleEducation,2,'(),');

  Date=scan(SingleEducation,3,'(),');

  output;

end;

drop i EDUCATION;

run;

Additional excercise: convert Date to a real SAS date, set the length of Degree and Years columns.

Good luck!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Personally I would read it in as ; delimited, then post process the dataset (haven't tested, but should be near enough):

data have;

     infile "xyz.txt" dlm=";" missover dsd lrecl=32767;

     input ...;

run;

data want;

     set have;

     array education_split{10} $200.;

     do I=1 to countc(education,"!")+1; /* Do at leats twice if ! occurs */

          education_split=scan(education,i,"!");

     end;

run;

Saurabh_Amar
Calcite | Level 5

Thanks Gergely and RW9..will try this..

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1069 views
  • 0 likes
  • 3 in conversation