Help using Base SAS procedures

Reading RawData

Reply
New Contributor
Posts: 3

Reading RawData

Hi Everyone,

kindly help me to read the following raw data into sas. Thanks in advance.

AronM374856Texas

BrandonM847584California

JaneF856747Huston

ClarindaF748574Newyork

ArnoldM435867Huston

SAS Super FREQ
Posts: 8,861

Re: Reading RawData

Posted in reply to stolinsas

Hi:

What code have you tried? It is hard to figure out where the variables start and stop, I suppose you could read in each line and parse out the variables from the "back end". But what type of file is this? Is this a file without any delimiters? What is the front end process that creates this file? Is this a 1 time read process or do you have to read a file like this on a periodic basis?

cynthia

New Contributor
Posts: 3

Re: Reading RawData

Posted in reply to Cynthia_sas

Hi Cynthia,

This is an unformatted data without delimeter And it is for one time process. I have tried all basic styles of reading taw data but that didnt work. Kindly solve this one if there are any advanced techniques.

STolin

Super User
Posts: 19,768

Re: Reading RawData

Posted in reply to stolinsas

Not pretty but you can do it using combinations of scan/compress/reverse/substr functions.

Regular Expressions would also work, but I avoid them like the plague.

data have;

informat text $256.;

input text $;

first_part=scan(text, 1, ,'d');

Name=substr(first_part, 1, length(first_part)-1);

Gender=substr(reverse(trim(first_part)),1,1);

RandomNumber=compress(text, , 'kd');

State=scan(text, 2, ,'d');

cards;

AronM374856Texas

BrandonM847584California

JaneF856747Huston

ClarindaF748574Newyork

ArnoldM435867Huston

;

run;

proc print data=have;

run;

New Contributor
Posts: 3

Re: Reading RawData

Hi Reeza,

Thank you for your helpful code with which i got the solution. But still confused about how exactly does k & d delimiters works?

:

Super Contributor
Posts: 490

Re: Reading RawData

Posted in reply to stolinsas

They are called function modifier you can find a list of them in the SAS function reference for each function. for example SCAN Function

This blog give good examples COMPRESS: SAS Function strips characters from the string of using them and combine them.

Super User
Posts: 10,018

Re: Reading RawData

Posted in reply to stolinsas

data have;

informat text $256.;

input text $;

cards;

AronM374856Texas

BrandonM847584California

JaneF856747Huston

ClarindaF748574Newyork

ArnoldM435867Huston

;

run;

data want;

set have;

length  Name Gender RandomNumber State$ 100;

re=prxparse('/(\w+)([F|M])(\d+)(\w+)/o');

if prxmatch(re,text) then do;

   Name=prxposn(re,1,text);

   Gender=prxposn(re,2,text);

   RandomNumber=prxposn(re,3,text);

   State=prxposn(re,4,text);

end;

drop re;

run;

Xia Keshan

Super Contributor
Posts: 435

Re: Reading RawData

When I ran your code I could see the variable 're' returns the value '1' for all the records. May I request you to explain how your prxparse function works here?

Super User
Posts: 10,018

Re: Reading RawData

It is a Perl Regular Expression, It is matched a pattern of your data ,Since all of obs matched such pattern , RE is naturally returning 1 all time. If you are hard to understand it ,then use Reeza's code , Hers is better understood .

Xia Keshan

Super Contributor
Posts: 435

Re: Reading RawData

I wish to understand  your Perl Regular Expression, although it is difficultSmiley Happy

Super User
Posts: 10,018

Re: Reading RawData

OK. The good knowledge is SAS documentation . You'd refer to it if you want more . For your example :

(\w+)([F|M])(\d+)(\w+)


\w+ matched one or more words character(i.e. 0-9 a-z A-Z _ )

[F|M]  matched  F  or  M

\d+   matched one or more  digits (i.e. 0-9 )


Therefore this Perl Regular Expression is trying to match your string ,in other words , your string have such pattern .



Xia Keshan

Super User
Super User
Posts: 7,942

Re: Reading RawData

Posted in reply to stolinsas

Whilst Xia Keshan has provided a great solution, I would ask why your data looks like that in the first place.  You could face all kinds of issues if you leave it as such, for instance what will happen with missing data, say sex is missing, how will you handle:

Aron374856Texas?

I would suggest you need to return to the source of the data and fix that so the data is clear.

Trusted Advisor
Posts: 1,301

Re: Reading RawData

Posted in reply to stolinsas

data foo;

infile cards length=len;

input @;

_1=findc(_infile_, 'MF', 2)-1;

_2=len-_1-6;

input name $varying32. _1 gender $1. num 6. state $varying32. _2;

cards;

AronM374856Texas

BrandonM847584California

JaneF856747Huston

ClarindaF748574Newyork

ArnoldM435867Huston

;

run;

makes a good point here, and you should consider his suggestion.

Ask a Question
Discussion stats
  • 12 replies
  • 453 views
  • 5 likes
  • 8 in conversation