Help using Base SAS procedures

How to define input variable when datalines have spaces for a variable

Reply
Occasional Contributor
Posts: 9

How to define input variable when datalines have spaces for a variable

Dear Folks,

I am new to SAS and facing difficulty to understand the below point:

The below program is not giving correct output because the data after datalines statement is not following the spacing that is required as per variables defination in 'input' statement. Here, I have only 5 record lines and thus it is easy for me to change the spacing in datalines. But, if I have millions of records and data is coming as in current form (where the 'name' variable is having space in between, how would i define my 'input' statement so that output is correct as required. I tried using colon (Smiley Happy in the input statement, but couldn't get the desired result. Please help!!

options pagesize=60 linesize=80 pageno=1 nodate;

data sales;

input EmployeeID $ 1-9 Name $ 11-29 @30 HireDate date9.

Salary HomePhone $;

format HireDate date9.;

datalines;

429685482 Martin, Virginia 09aug1990 34800 493-0824

244967839 Singleton, MaryAnn 24apr1995 27900 929-2623

996740216 Leighton, Maurice 16dec1993 32600 933-6908

675443925 Freuler, Carl 15feb1998 29900 493-3993

845729308 Cage, Merce 19oct1992 39800 286-0519

;

proc print data=sales;

title ’Sales Department Employees’;

run;

Respected Advisor
Posts: 3,124

Re: How to define input variable when datalines have spaces for a variable

I suppose you have a 'blank' in front of the second name, otherwise it shouldn't be a problem for modified list input (Smiley Happy plus dlm=' '. There are ways to play with _infile_ variable, but it seems to me a straightforward approach will be reading in 'name' as two variables, and then concatenating them, if needed.

data sales;

infile datalines truncover dlm=', ';

input EmployeeID :$9. LName :$20. FName :$20. HireDate :date9. Salary HomePhone :$20.;

format HireDate date9.;

datalines;

429685482 Martin, Virginia 09aug1990 34800 493-0824

244967839 Singleton, MaryAnn 24apr1995 27900 929-2623

996740216 Leighton, Maurice 16dec1993 32600 933-6908

675443925 Freuler, Carl 15feb1998 29900 493-3993

845729308 Cage, Merce 19oct1992 39800 286-0519

;

Super User
Posts: 10,473

Re: How to define input variable when datalines have spaces for a variable

Since you have "millions of records" the Datalines is not how you should read the data.

What is the file format of the original data? Are there quotes around text values? Or are the columns fixed width?

If you copy data out of some formats and paste into the editor then the editor is making some default decisions on how to paste and the "copy" is making choices on what is placed on the clipboard. and may be losing information in the original data set.

For instance this is the result of copying a line of text from a CSV file opened in Excel and pasted into the SAS editor:

4329 603 1 6/21/1990 83202 1 13 4/9/2015 4 2 3 2

However in the CSV file we have when opened in Wordpad and pasted:

4329,603,1,6/21/1990,83202,1,"1, 3",4/9/2015,4,2,3,2,

Reading the later from a file with a comma delimiter the text of your values with spaces will most likely have quotes around them and not be a problem to read. Similar options involve other types of files.

Datalines also restricts the options that can be used to read the data.

Super User
Posts: 9,671

Re: How to define input variable when datalines have spaces for a variable

Is the name always two words ? or one word ?


data sales;
infile datalines truncover ;
input EmployeeID : $20. temp $100.;
name=substr(temp,1,anydigit(temp)-1);
HireDate=scan(temp,-3,' ');
Salary=scan(temp,-2,' ');
HomePhone=scan(temp,-1,' ');
drop temp;
datalines;
429685482 Martin, Virginia 09aug1990 34800 493-0824
244967839 Singleton, MaryAnn 24apr1995 27900 929-2623
996740216 Leighton, Maurice 16dec1993 32600 933-6908
675443925 Freuler, Carl 15feb1998 29900 493-3993
845729308 Cage, Merce 19oct1992 39800 286-0519
;

Xia Keshan

Occasional Contributor
Posts: 9

Re: How to define input variable when datalines have spaces for a variable

Hi Xia,

Thanks , appreciate your willingness to guide me.

The name can be 1, 2 or even three characters with spaces / and / or comma in between..

Regards

Respected Advisor
Posts: 3,777

Re: How to define input variable when datalines have spaces for a variable

Based on your example it would be logical given 3 fields following NAME to SCAN from the end of _INFILE_ to find the beginning of HireDate.  With this you can calculate the length of NAME and read it with $VARYING.

data Emp;
   infile datalines col=col;
   informat employeeid $12. Name $varying64. HireDate date9. salary comma12. phone $8.;
  
input EmployeeID @;
   call scan(_infile_,-3,s,e,' ');
   l = s-col;
  
input Name $varying64. l HireDate Salary Phone;
   format HireDate date11. salary dollar12.;
  
drop l s e;
   datalines;
429685482 Martin, Virginia 09aug1990 34800 493-0824
244967839 Singleton, MaryAnn 24apr1995 27900 929-2623
996740216 Leighton, Maurice 16dec1993 32600 933-6908
675443925 Freuler, Carl 15feb1998 29900 493-3993
845729308 Cage, Merce 19oct1992 39800 286-0519
;;;;
   run;
proc print;
  
run;
Respected Advisor
Posts: 3,124

Re: How to define input variable when datalines have spaces for a variable

Similar to 's approach,

data sales;

     infile datalines truncover;

     input @;

     _patternID = prxparse('/\d\D+\d/io');

     call prxsubstr(_patternID, _infile_, _position, _len);

     _len=_len-3;

     input EmployeeID $9.

           @11 Name $varying30. _len

           @(12+_len)

        HireDate :date9.

        Salary

        HomePhone $;

     drop _:;

     format HireDate date9.;

     datalines;

429685482 Martin, KM. Virginia 09aug1990 34800 493-0824

244967839 Singleton, G MaryAnn 24apr1995 27900 929-2623

996740216 Leighton, Maurice 16dec1993 32600 933-6908

675443925 Freuler, Mike Carl 15feb1998 29900 493-3993

845729308 Cage, Merce 19oct1992 39800 286-0519

;

Ask a Question
Discussion stats
  • 6 replies
  • 349 views
  • 5 likes
  • 5 in conversation