BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Divya
Calcite | Level 5

Hi,

Looking for the most efficient method to read this data:

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

This unaligned data represents name, age and salary variables. The data values are separated by a single space and the variable "name" contains an embedded blank.

Thanks,

Divya

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

If you know there are always 2 non blank fields that follow you can count backward to find the length of the field and use $VARYING. Using Chang's data.

data one;

   infile cards;

   input @;

   call scan(_infile_,-2,p,l,' ');

  l=p-2;

   input name $varying30. l age inc :comma9.;

cards;

John Q. Matthew 40 $90,000

George Wilson 28    $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

Hello    Kitty . .

;

   run;

proc print;

   run;

View solution in original post

14 REPLIES 14
shivas
Pyrite | Level 9

Hi,

Try this..

data one;

infile cards dlm='';

input name $ 1-15 age 16-18 sal :dollar.;

format sal dollar7.;

cards;

John Matthew   40 $90,000

George Wilson  28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas     35 $70,000

;

run;

Thanks,

Shiva

Divya
Calcite | Level 5

Hi shiva,

Thanks for the response. But the problem I am facing here is name and age data values are separated by a single space and its a long text file so I cannot manually insert spaces between the two and make age values start from the 16th column.

If that was the case, I would have used ampersand format modifier to read it, but here I cannot.

Is there any other way of doing it?

Divya

Linlin
Lapis Lazuli | Level 10

Hi Divya,

I don't have time now. It may helpful to look at this discussion:

https://communities.sas.com/message/112959#112959

Haikuo
Onyx | Level 15

This work for your data as is, you can modify it to fit more complex situations hopefully:

data one (drop=_:);

infile cards dlm='';

input @;

name=catx(' ',scan(_infile_,1), scan(_infile_,2));;

input _1 _2 age sal:dollar11.;

format sal dollar7.;

cards;

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

proc print;run;

Regards,

Haikuo

Linlin
Lapis Lazuli | Level 10

another approach:

data one(drop=p1 p2) ;

length name $ 20;
infile cards;
input;
p1=findc(_infile_,'0123456789');
p2=find(_infile_,'$');
name=substr(_infile_,1,p1-1);
age=input(substr(_infile_,p1,p2-1),3.);
salary=input(substr(_infile_,p2+1),dollar11.);
format salary dollar11.;
cards;
John Matthew 40 $90,000
George Wilson 28 $50,000
Robert Nicolas 30 $60,000
Leo Thomas 35 $70,000
;
proc print;run;
                             Obs         name         age         salary

                              1     John Matthew       40        $90,000
                              2     George Wilson      28        $50,000
                              3     Robert Nicolas     30        $60,000
                              4     Leo Thomas         35        $70,000

FloydNevseta
Pyrite | Level 9

If the name always consists of first name and last name then read them directly

and concatenate like this:

data one (drop=_:);

infile cards dlm='';

input _fname :$8. _lname :$15. age sal :dollar11.;

name=catx(' ', _fname, _lname);

format sal dollar7.;

cards;

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

run;

But if the name varies because of middle names and initials, then something like this will work:

data one (drop=_:);

infile cards dlm='';

length _partial_nm $15 name $45;

do while ( 1 );

   input _partial_nm :$15. @;

   if anydigit( _partial_nm ) > 0 then leave; * found age;

   name = catx( ' ', name, _partial_nm );

end;

age = input( _partial_nm, 15. );

input sal :dollar11.;

format sal dollar7.;

cards;

John J. Matthew 40 $90,000

George Wilson 28 $50,000

Robert David Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

run;

art297
Opal | Level 21

or, a modification of Linlin's approach:

data one;

  length name $ 20;

  infile cards;

  input @;

  _infile_=catx("  ",substr(_infile_,1,anydigit(_infile_)-1),

                substr(_infile_,anydigit(_infile_)));

  input name & age salary;

  informat salary dollar11.;

cards;

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

MikeZdeb
Rhodochrosite | Level 12

hi ... the more the merrier ...

data x;

infile datalines dsd dlm='/';

input @;


_infile_ = translate(_infile_,'/',' ');

substr(_infile_ , find(_infile_,'/') , 1) = ' ';


input name : $30. age salary : comma.;

format salary dollar10.;

datalines;

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

Anne
Calcite | Level 5

data a;

     input (fname lname ) ($) age salary :dollar.;

datalines;

John Matthew 40 $90,000

George Wilson 28 $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

;

data a;

     set a;

     FullName=trim(fname)||' '||trim(lname);

     drop fname lname;

run;

chang_y_chung_hotmail_com
Obsidian | Level 7

Same idea as Mike's (aka "the _infile_ magic"), but a bit more robust against possible middle names and extra blanks as such. I am pretty sure this is not most "efficient." 🙂

%*-- sub last two occurances of blanks into bars(|) --*;
%let b = \s+;      %*-- blanks     --*;
%let d = \d+;      %*-- digits     --*;
%let nb = [^\s]+;  %*-- non-blanks --*;
%let nd = [^\d]+;  %*-- non-digits --*;
%let prx = s/^(&nd)(&b)(&d)(&b)(&nb)\s*$/$1|$3|$5/;

data one;
   infile cards dlm="|";
   input @;
   _infile_ = prxchange("&prx", 1, _infile_);
   input name :$30. age inc :comma9.;
cards;
John Q. Matthew 40 $90,000
George Wilson 28    $50,000
Robert Nicolas 30 $60,000
Leo Thomas 35 $70,000
;
run;

/* check */
proc print data=one;
   format inc dollar9.;
run;
/* on lst
Obs    name               age          inc

1     John Q. Matthew     40      $90,000
2     George Wilson       28      $50,000
3     Robert Nicolas      30      $60,000
4     Leo Thomas          35      $70,000
*/

data_null__
Jade | Level 19

If you know there are always 2 non blank fields that follow you can count backward to find the length of the field and use $VARYING. Using Chang's data.

data one;

   infile cards;

   input @;

   call scan(_infile_,-2,p,l,' ');

  l=p-2;

   input name $varying30. l age inc :comma9.;

cards;

John Q. Matthew 40 $90,000

George Wilson 28    $50,000

Robert Nicolas 30 $60,000

Leo Thomas 35 $70,000

Hello    Kitty . .

;

   run;

proc print;

   run;

Ksharp
Super User

How about:

data x (drop=x:);
infile cards truncover ;
length name $ 40;
input (x1-x3) (:$20.) @;
if anydigit(x3) then do;
                       input +(-1*(length(x3)+1)) age sal : dollar12.;
                       call missing(x3);
                     end;
 else input age sal : dollar12.;
name=catx(' ',of x1-x3);
cards;
John J. Matthew 40 $90,000
George Wilson 28 $50,000
Robert David Nicolas 30 $60,000
Leo Thomas 35 $70,000
Xia Ke Shan 45 $43,432
Hello    Kitty . .
;
run;

Ksharp

Message was edited by: xia keshan

ScottBass
Rhodochrosite | Level 12

Adding one more reply to the mix...

data one;

   length line $100;

   infile cards dlm="|";

   input line;

   cards;

John Q. Matthew 40 $90,000

George  Wilson   28    $50,000

Robert     Nicolas      30 $60,000

Leo Thomas  35    $70,000

Thurston Howell The 3rd  65 $1,000,000

Thurston Howell The 3 rd  65 $1,000,000

;

run;

data two;

  length name age inc $40;

  set one;

  rx=prxparse("/^(.*?)(\d+)\s+(.*)$/");

  if prxmatch(rx,line) then do;

    name=prxposn(rx,1,line);

    * age=input(prxposn(rx,2,line),best.);

    age=prxposn(rx,2,line);

    * inc=input(prxposn(rx,3,line),dollar18.);

    inc=prxposn(rx,3,line);

    inc2=input(inc,dollar18.);

  end;

  format inc2 dollar18.;

run;

The regular expression in prxparse is:

1) Lazy capture all characters anchored at the start of the line until you encounter one or more digits separated by one or more spaces.  That's why the first "Thurston Howell" line worked but the second one didn't.

2) Capture the first encounter of one or more digits followed by one or more spaces.  The lazy capture (.*?) ensures you stop at the first encounter of the digits.  Since dot (.) matches ANY character, a greedy capture would capture until the last encounter of digit + a space.  Probably not an issue with this data, but still, I think a lazy capture is best here.

3) Greedy capture all characters after the digit plus one or more spaces until the end of the line.

I left name, age, and inc as character values for debugging.  Once you're happy with the parsing, you can change the data type and nest the prxposn calls inside the input calls.

HTH,

Scott

P.S.:  Can someone explain how you embed the SAS formatted code (color, etc.) into these messages?


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Andre
Obsidian | Level 7

Scott

copying from the enhanced editor to Word (2010 for me)

and then copy from word to the answer screen  as it is.


I think it is working


data two;

  length name age inc $40;

  set one;

  rx=prxparse("/^(.*?)(\d+)\s+(.*)$/");

  if prxmatch(rx,line) then do;

name=prxposn(rx,1,line);

    * age=input(prxposn(rx,2,line),best.);

age=prxposn(rx,2,line);

    * inc=input(prxposn(rx,3,line),dollar18.);

inc=prxposn(rx,3,line);

inc2=input(inc,dollar18.);

  end;

  format inc2 dollar18.;

run;


HTH


Andre

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 8344 views
  • 8 likes
  • 13 in conversation