Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

few variables have multiple values in same row delimited with some other delimiter

Reply
Occasional Contributor
Posts: 6

few variables have multiple values in same row delimited with some other delimiter

Hi,

I have a flat file with ';' as a delimiter. The file is for a company at employee id level (unique row for each employee). Couple of variable (like education history) may have multiple values for each employee, which are delimited by '!' in my file. below is one example.

EMP_ID AGE NAME EDUCATION OCCUPATION

12345 ; 45 ; john smith ; "Doctorate Degree (over 19 years),5/1/2014 ! Masters's Degree (±18 years),1/1/2013 ! Bachelor's Degree (±16 years),1/1/2011" ; manager

23456 ; 30 ; smith john ; "Bachelor's Degree (±16 years),1/1/2003" ; analyst

for education field (which has ';' in starting and end) values in paranthesis indicate years in education and date represents when the degree was completed, separated by a comma.

there are other fields as well in the data but i have mentioned only few for example. I need to read the data in sas but a bit confused how to treat multiple delimiters in this case.

Thanks!

SA

SAS Employee
Posts: 340

Re: few variables have multiple values in same row delimited with some other delimiter

You need to manually deal with the second delimiter. Using functions. Here is an example:

data want;

infile 'C:\temp\file.txt' dsd dlm=';';

input EMP_ID:$10. AGE NAME:$20. EDUCATION:$100. OCCUPATION:$10.;

do i=1 to countw(EDUCATION,'!');

  SingleEducation=scan(EDUCATION,i,'!');

  Degree=scan(SingleEducation,1,'(),');

  Years=scan(SingleEducation,2,'(),');

  Date=scan(SingleEducation,3,'(),');

  output;

end;

drop i EDUCATION;

run;

Additional excercise: convert Date to a real SAS date, set the length of Degree and Years columns.

Good luck!

Super User
Super User
Posts: 7,406

Re: few variables have multiple values in same row delimited with some other delimiter

Personally I would read it in as ; delimited, then post process the dataset (haven't tested, but should be near enough):

data have;

     infile "xyz.txt" dlm=";" missover dsd lrecl=32767;

     input ...;

run;

data want;

     set have;

     array education_split{10} $200.;

     do I=1 to countc(education,"!")+1; /* Do at leats twice if ! occurs */

          education_split=scan(education,i,"!");

     end;

run;

Occasional Contributor
Posts: 6

Re: few variables have multiple values in same row delimited with some other delimiter

Thanks Gergely and RW9..will try this..

Ask a Question
Discussion stats
  • 3 replies
  • 227 views
  • 0 likes
  • 3 in conversation