BookmarkSubscribeRSS Feed
stolinsas
Calcite | Level 5

Hi Everyone,

kindly help me to read the following raw data into sas. Thanks in advance.

AronM374856Texas

BrandonM847584California

JaneF856747Huston

ClarindaF748574Newyork

ArnoldM435867Huston

12 REPLIES 12
Cynthia_sas
SAS Super FREQ

Hi:

What code have you tried? It is hard to figure out where the variables start and stop, I suppose you could read in each line and parse out the variables from the "back end". But what type of file is this? Is this a file without any delimiters? What is the front end process that creates this file? Is this a 1 time read process or do you have to read a file like this on a periodic basis?

cynthia

stolinsas
Calcite | Level 5

Hi Cynthia,

This is an unformatted data without delimeter And it is for one time process. I have tried all basic styles of reading taw data but that didnt work. Kindly solve this one if there are any advanced techniques.

STolin

Reeza
Super User

Not pretty but you can do it using combinations of scan/compress/reverse/substr functions.

Regular Expressions would also work, but I avoid them like the plague.

data have;

informat text $256.;

input text $;

first_part=scan(text, 1, ,'d');

Name=substr(first_part, 1, length(first_part)-1);

Gender=substr(reverse(trim(first_part)),1,1);

RandomNumber=compress(text, , 'kd');

State=scan(text, 2, ,'d');

cards;

AronM374856Texas

BrandonM847584California

JaneF856747Huston

ClarindaF748574Newyork

ArnoldM435867Huston

;

run;

proc print data=have;

run;

stolinsas
Calcite | Level 5

Hi Reeza,

Thank you for your helpful code with which i got the solution. But still confused about how exactly does k & d delimiters works?

:

mohamed_zaki
Barite | Level 11

They are called function modifier you can find a list of them in the SAS function reference for each function. for example SCAN Function

This blog give good examples COMPRESS: SAS Function strips characters from the string of using them and combine them.

Ksharp
Super User

data have;

informat text $256.;

input text $;

cards;

AronM374856Texas

BrandonM847584California

JaneF856747Huston

ClarindaF748574Newyork

ArnoldM435867Huston

;

run;

data want;

set have;

length  Name Gender RandomNumber State$ 100;

re=prxparse('/(\w+)([F|M])(\d+)(\w+)/o');

if prxmatch(re,text) then do;

   Name=prxposn(re,1,text);

   Gender=prxposn(re,2,text);

   RandomNumber=prxposn(re,3,text);

   State=prxposn(re,4,text);

end;

drop re;

run;

Xia Keshan

Babloo
Rhodochrosite | Level 12

When I ran your code I could see the variable 're' returns the value '1' for all the records. May I request you to explain how your prxparse function works here?

Ksharp
Super User

It is a Perl Regular Expression, It is matched a pattern of your data ,Since all of obs matched such pattern , RE is naturally returning 1 all time. If you are hard to understand it ,then use Reeza's code , Hers is better understood .

Xia Keshan

Babloo
Rhodochrosite | Level 12

I wish to understand  your Perl Regular Expression, although it is difficult:)

Ksharp
Super User

OK. The good knowledge is SAS documentation . You'd refer to it if you want more . For your example :

(\w+)([F|M])(\d+)(\w+)


\w+ matched one or more words character(i.e. 0-9 a-z A-Z _ )

[F|M]  matched  F  or  M

\d+   matched one or more  digits (i.e. 0-9 )


Therefore this Perl Regular Expression is trying to match your string ,in other words , your string have such pattern .



Xia Keshan

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Whilst Xia Keshan has provided a great solution, I would ask why your data looks like that in the first place.  You could face all kinds of issues if you leave it as such, for instance what will happen with missing data, say sex is missing, how will you handle:

Aron374856Texas?

I would suggest you need to return to the source of the data and fix that so the data is clear.

FriedEgg
SAS Employee

data foo;

infile cards length=len;

input @;

_1=findc(_infile_, 'MF', 2)-1;

_2=len-_1-6;

input name $varying32. _1 gender $1. num 6. state $varying32. _2;

cards;

AronM374856Texas

BrandonM847584California

JaneF856747Huston

ClarindaF748574Newyork

ArnoldM435867Huston

;

run;

makes a good point here, and you should consider his suggestion.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1893 views
  • 5 likes
  • 8 in conversation