DATA Step, Macro, Functions and more

Reading Embedded Spaces in Data

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Reading Embedded Spaces in Data

Hi,

Looking for the most efficient method to read this data:

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

This unaligned data represents name, age and salary variables. The data values are separated by a single space and the variable "name" contains an embedded blank.

Thanks,

Divya


Accepted Solutions
Solution
‎05-10-2012 06:48 PM
Respected Advisor
Posts: 3,799

Re: Reading Embedded Spaces in Data

If you know there are always 2 non blank fields that follow you can count backward to find the length of the field and use $VARYING. Using Chang's data.

data one;

   infile cards;

   input @;

   call scan(_infile_,-2,p,l,' ');

  l=p-2;

   input name $varying30. l age inc :comma9.;

cards;

John Q. Matthew 40 $90,000

George Wilson 28    $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

Hello    Kitty . .

;

   run;

proc print;

   run;

View solution in original post


All Replies
Super Contributor
Posts: 349

Re: Reading Embedded Spaces in Data

Hi,

Try this..

data one;

infile cards dlm='';

input name $ 1-15 age 16-18 sal :dollar.;

format sal dollar7.;

cards;

John Matthew   40 $90,000

George Wilson  28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas     35 $70,000

;

run;

Thanks,

Shiva

New Contributor
Posts: 2

Re: Reading Embedded Spaces in Data

Hi shiva,

Thanks for the response. But the problem I am facing here is name and age data values are separated by a single space and its a long text file so I cannot manually insert spaces between the two and make age values start from the 16th column.

If that was the case, I would have used ampersand format modifier to read it, but here I cannot.

Is there any other way of doing it?

Divya

Super Contributor
Posts: 1,636

Re: Reading Embedded Spaces in Data

Hi Divya,

I don't have time now. It may helpful to look at this discussion:

https://communities.sas.com/message/112959#112959

Respected Advisor
Posts: 3,156

Re: Reading Embedded Spaces in Data

This work for your data as is, you can modify it to fit more complex situations hopefully:

data one (drop=_Smiley Happy;

infile cards dlm='';

input @;

name=catx(' ',scan(_infile_,1), scan(_infile_,2));;

input _1 _2 age sal:dollar11.;

format sal dollar7.;

cards;

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

proc print;run;

Regards,

Haikuo

Super Contributor
Posts: 1,636

Re: Reading Embedded Spaces in Data

another approach:

data one(drop=p1 p2) ;

length name $ 20;
infile cards;
input;
p1=findc(_infile_,'0123456789');
p2=find(_infile_,'$');
name=substr(_infile_,1,p1-1);
age=input(substr(_infile_,p1,p2-1),3.);
salary=input(substr(_infile_,p2+1),dollar11.);
format salary dollar11.;
cards;
John Matthew 40 $90,000
George Wilson 28 $50,000
Robert Nicolas 30 $60,000
Leo Thomas 35 $70,000
;
proc print;run;
                             Obs         name         age         salary

                              1     John Matthew       40        $90,000
                              2     George Wilson      28        $50,000
                              3     Robert Nicolas     30        $60,000
                              4     Leo Thomas         35        $70,000

Frequent Contributor
Posts: 101

Re: Reading Embedded Spaces in Data

If the name always consists of first name and last name then read them directly

and concatenate like this:

data one (drop=_Smiley Happy;

infile cards dlm='';

input _fname :$8. _lname :$15. age sal :dollar11.;

name=catx(' ', _fname, _lname);

format sal dollar7.;

cards;

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

run;

But if the name varies because of middle names and initials, then something like this will work:

data one (drop=_Smiley Happy;

infile cards dlm='';

length _partial_nm $15 name $45;

do while ( 1 );

   input _partial_nm :$15. @;

   if anydigit( _partial_nm ) > 0 then leave; * found age;

   name = catx( ' ', name, _partial_nm );

end;

age = input( _partial_nm, 15. );

input sal :dollar11.;

format sal dollar7.;

cards;

John J. Matthew 40 $90,000

George Wilson 28 $50,000

Robert David Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

run;

PROC Star
Posts: 7,486

Re: Reading Embedded Spaces in Data

or, a modification of Linlin's approach:

data one;

  length name $ 20;

  infile cards;

  input @;

  _infile_=catx("  ",substr(_infile_,1,anydigit(_infile_)-1),

                substr(_infile_,anydigit(_infile_)));

  input name & age salary;

  informat salary dollar11.;

cards;

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

Valued Guide
Posts: 765

Re: Reading Embedded Spaces in Data

hi ... the more the merrier ...

data x;

infile datalines dsd dlm='/';

input @;


_infile_ = translate(_infile_,'/',' ');

substr(_infile_ , find(_infile_,'/') , 1) = ' ';


input name : $30. age salary : comma.;

format salary dollar10.;

datalines;

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

New Contributor
Posts: 3

Re: Reading Embedded Spaces in Data

data a;

     input (fname lname ) ($) age salary :dollar.;

datalines;

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

data a;

     set a;

     FullName=trim(fname)||' '||trim(lname);

     drop fname lname;

run;

Regular Contributor
Posts: 241

Re: Reading Embedded Spaces in Data

Same idea as Mike's (aka "the _infile_ magic"), but a bit more robust against possible middle names and extra blanks as such. I am pretty sure this is not most "efficient." :-)

%*-- sub last two occurances of blanks into bars(|) --*;
%let b = \s+;      %*-- blanks     --*;
%let d = \d+;      %*-- digits     --*;
%let nb = [^\s]+;  %*-- non-blanks --*;
%let nd = [^\d]+;  %*-- non-digits --*;
%let prx = s/^(&nd)(&b)(&d)(&b)(&nb)\s*$/$1|$3|$5/;

data one;
   infile cards dlm="|";
   input @;
   _infile_ = prxchange("&prx", 1, _infile_);
   input name :$30. age inc :comma9.;
cards;
John Q. Matthew 40 $90,000
George Wilson 28    $50,000
Robert Nicolas 30 $60,000
Leo Thomas 35 $70,000
;
run;

/* check */
proc print data=one;
   format inc dollar9.;
run;
/* on lst
Obs    name               age          inc

1     John Q. Matthew     40      $90,000
2     George Wilson       28      $50,000
3     Robert Nicolas      30      $60,000
4     Leo Thomas          35      $70,000
*/

Solution
‎05-10-2012 06:48 PM
Respected Advisor
Posts: 3,799

Re: Reading Embedded Spaces in Data

If you know there are always 2 non blank fields that follow you can count backward to find the length of the field and use $VARYING. Using Chang's data.

data one;

   infile cards;

   input @;

   call scan(_infile_,-2,p,l,' ');

  l=p-2;

   input name $varying30. l age inc :comma9.;

cards;

John Q. Matthew 40 $90,000

George Wilson 28    $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

Hello    Kitty . .

;

   run;

proc print;

   run;

Super User
Posts: 10,041

Re: Reading Embedded Spaces in Data

How about:

data x (drop=x:);
infile cards truncover ;
length name $ 40;
input (x1-x3) (:$20.) @;
if anydigit(x3) then do;
                       input +(-1*(length(x3)+1)) age sal : dollar12.;
                       call missing(x3);
                     end;
 else input age sal : dollar12.;
name=catx(' ',of x1-x3);
cards;
John J. Matthew 40 $90,000
George Wilson 28 $50,000
Robert David Nicolas 30 $60,000
Leo Thomas 35 $70,000
Xia Ke Shan 45 $43,432
Hello    Kitty . .
;
run;

Ksharp

Message was edited by: xia keshan

Super Contributor
Posts: 387

Re: Reading Embedded Spaces in Data

Adding one more reply to the mix...

data one;

   length line $100;

   infile cards dlm="|";

   input line;

   cards;

John Q. Matthew 40 $90,000

George  Wilson   28    $50,000

Robert     Nicolas      30 $60,000

Leo Thomas  35    $70,000

Thurston Howell The 3rd  65 $1,000,000

Thurston Howell The 3 rd  65 $1,000,000

;

run;

data two;

  length name age inc $40;

  set one;

  rx=prxparse("/^(.*?)(\d+)\s+(.*)$/");

  if prxmatch(rx,line) then do;

    name=prxposn(rx,1,line);

    * age=input(prxposn(rx,2,line),best.);

    age=prxposn(rx,2,line);

    * inc=input(prxposn(rx,3,line),dollar18.);

    inc=prxposn(rx,3,line);

    inc2=input(inc,dollar18.);

  end;

  format inc2 dollar18.;

run;

The regular expression in prxparse is:

1) Lazy capture all characters anchored at the start of the line until you encounter one or more digits separated by one or more spaces.  That's why the first "Thurston Howell" line worked but the second one didn't.

2) Capture the first encounter of one or more digits followed by one or more spaces.  The lazy capture (.*?) ensures you stop at the first encounter of the digits.  Since dot (.) matches ANY character, a greedy capture would capture until the last encounter of digit + a space.  Probably not an issue with this data, but still, I think a lazy capture is best here.

3) Greedy capture all characters after the digit plus one or more spaces until the end of the line.

I left name, age, and inc as character values for debugging.  Once you're happy with the parsing, you can change the data type and nest the prxposn calls inside the input calls.

HTH,

Scott

P.S.:  Can someone explain how you embed the SAS formatted code (color, etc.) into these messages?

Super Contributor
Posts: 273

Re: Reading Embedded Spaces in Data

Posted in reply to ScottBass

Scott

copying from the enhanced editor to Word (2010 for me)

and then copy from word to the answer screen  as it is.


I think it is working


data two;

  length name age inc $40;

  set one;

  rx=prxparse("/^(.*?)(\d+)\s+(.*)$/");

  if prxmatch(rx,line) then do;

name=prxposn(rx,1,line);

    * age=input(prxposn(rx,2,line),best.);

age=prxposn(rx,2,line);

    * inc=input(prxposn(rx,3,line),dollar18.);

inc=prxposn(rx,3,line);

inc2=input(inc,dollar18.);

  end;

  format inc2 dollar18.;

run;


HTH


Andre

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 757 views
  • 8 likes
  • 13 in conversation