Dear Folks,
I am new to SAS and facing difficulty to understand the below point:
The below program is not giving correct output because the data after datalines statement is not following the spacing that is required as per variables defination in 'input' statement. Here, I have only 5 record lines and thus it is easy for me to change the spacing in datalines. But, if I have millions of records and data is coming as in current form (where the 'name' variable is having space in between, how would i define my 'input' statement so that output is correct as required. I tried using colon (:) in the input statement, but couldn't get the desired result. Please help!!
options pagesize=60 linesize=80 pageno=1 nodate;
data sales;
input EmployeeID $ 1-9 Name $ 11-29 @30 HireDate date9.
Salary HomePhone $;
format HireDate date9.;
datalines;
429685482 Martin, Virginia 09aug1990 34800 493-0824
244967839 Singleton, MaryAnn 24apr1995 27900 929-2623
996740216 Leighton, Maurice 16dec1993 32600 933-6908
675443925 Freuler, Carl 15feb1998 29900 493-3993
845729308 Cage, Merce 19oct1992 39800 286-0519
;
proc print data=sales;
title ’Sales Department Employees’;
run;
I suppose you have a 'blank' in front of the second name, otherwise it shouldn't be a problem for modified list input (:) plus dlm=' '. There are ways to play with _infile_ variable, but it seems to me a straightforward approach will be reading in 'name' as two variables, and then concatenating them, if needed.
data sales;
infile datalines truncover dlm=', ';
input EmployeeID :$9. LName :$20. FName :$20. HireDate :date9. Salary HomePhone :$20.;
format HireDate date9.;
datalines;
429685482 Martin, Virginia 09aug1990 34800 493-0824
244967839 Singleton, MaryAnn 24apr1995 27900 929-2623
996740216 Leighton, Maurice 16dec1993 32600 933-6908
675443925 Freuler, Carl 15feb1998 29900 493-3993
845729308 Cage, Merce 19oct1992 39800 286-0519
;
Since you have "millions of records" the Datalines is not how you should read the data.
What is the file format of the original data? Are there quotes around text values? Or are the columns fixed width?
If you copy data out of some formats and paste into the editor then the editor is making some default decisions on how to paste and the "copy" is making choices on what is placed on the clipboard. and may be losing information in the original data set.
For instance this is the result of copying a line of text from a CSV file opened in Excel and pasted into the SAS editor:
4329 603 1 6/21/1990 83202 1 1, 3 4/9/2015 4 2 3 2
However in the CSV file we have when opened in Wordpad and pasted:
4329,603,1,6/21/1990,83202,1,"1, 3",4/9/2015,4,2,3,2,
Reading the later from a file with a comma delimiter the text of your values with spaces will most likely have quotes around them and not be a problem to read. Similar options involve other types of files.
Datalines also restricts the options that can be used to read the data.
Is the name always two words ? or one word ?
data sales; infile datalines truncover ; input EmployeeID : $20. temp $100.; name=substr(temp,1,anydigit(temp)-1); HireDate=scan(temp,-3,' '); Salary=scan(temp,-2,' '); HomePhone=scan(temp,-1,' '); drop temp; datalines; 429685482 Martin, Virginia 09aug1990 34800 493-0824 244967839 Singleton, MaryAnn 24apr1995 27900 929-2623 996740216 Leighton, Maurice 16dec1993 32600 933-6908 675443925 Freuler, Carl 15feb1998 29900 493-3993 845729308 Cage, Merce 19oct1992 39800 286-0519 ;
Xia Keshan
Hi Xia,
Thanks , appreciate your willingness to guide me.
The name can be 1, 2 or even three characters with spaces / and / or comma in between..
Regards
Based on your example it would be logical given 3 fields following NAME to SCAN from the end of _INFILE_ to find the beginning of HireDate. With this you can calculate the length of NAME and read it with $VARYING.
data sales;
infile datalines truncover;
input @;
_patternID = prxparse('/\d\D+\d/io');
call prxsubstr(_patternID, _infile_, _position, _len);
_len=_len-3;
input EmployeeID $9.
@11 Name $varying30. _len
@(12+_len)
HireDate :date9.
Salary
HomePhone $;
drop _:;
format HireDate date9.;
datalines;
429685482 Martin, KM. Virginia 09aug1990 34800 493-0824
244967839 Singleton, G MaryAnn 24apr1995 27900 929-2623
996740216 Leighton, Maurice 16dec1993 32600 933-6908
675443925 Freuler, Mike Carl 15feb1998 29900 493-3993
845729308 Cage, Merce 19oct1992 39800 286-0519
;
Problem
The original code from the question:
data sales;
input EmployeeID $ 1-9 Name $ 11-29 @30 HireDate date9. Salary HomePhone $;
format HireDate date9.;
datalines;
429685482 Martin, Virginia 09aug1990 34800 493-0824
244967839 Singleton, MaryAnn 24apr1995 27900 929-2623
996740216 Leighton, Maurice 16dec1993 32600 933-6908
675443925 Freuler, Carl 15feb1998 29900 493-3993
845729308 Cage, Merce 19oct1992 39800 286-0519
;
proc print data=sales; run;
This code produces the following incorrect output:
Solution
The correct code for the 'name' var defines the input with an "&" and requires rows to have two spaces between columns. SAS reads in the next variable when the space between columns is more than two (>2):
data sales1;
input EmployeeID $9. Name & $29. HireDate date9. Salary HomePhone $;
format HireDate date9.;
datalines;
429685482 Martin, Virginia 09aug1990 34800 493-0824
244967839 Singleton, MaryAnn 24apr1995 27900 929-2623
996740216 Leighton, Maurice 16dec1993 32600 933-6908
675443925 Freuler, Carl 15feb1998 29900 493-3993
845729308 Cage, Merce 19oct1992 39800 286-0519
;
proc print data=sales1; run;
The following output shows what I think you're looking for:
For more information, see Method 3 here: HOW DO I READ IN A CHARACTER VARIABLE WITH VARYING LENGTH IN A SPACE DELIMITED DATASET? | SAS FAQ
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.