1 8001000004 S BUDATI CHIRUHA 6 07/07/1989 34XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
2 8001000081 MATANGI SAI SUNDAR KANTH 1 27/06/1995 3456721ZYWFGHMLJIACDEBQSPUVTRKON8
3 8001000102 KOLUSU VEERA VENKATA VAMSI KRI 6 05/08/1994 6L4357Z2YWKMJN81DEAQOPRSTUVBCFGHI
4 8001000106 INTURI SRINIVAS 27/06/1993 346725LWNYZ18MKIFGHCDEBAJOPQRSTUV
5 8001000134 BHUKYA VENKATESWARLU NAIK 2 08/04/1995 3456XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
6 8001000138 BURRI SWAPNA 1 01/07/1990 34612578ABCDEFGHIJKLMNOPQRSTUVWYZ
... i have like this text data 2000 observation how to separate
ser_no=1
reg_Id=8001000004
Name=S BUDATI CHIRUHA
Caste_Id=6
Dob=07/07/1989
i dont need remaing extra data like 34568xxxxxxxxxxxx
i am attached txt file below
thank you help me
data have;
infile 'Your New Text Document.txt';
input ser_no reg_id : $20. @;
call scan(_infile_, -3, position, length,' ');
fwidth=position-anyalpha(_infile_);
if length=1 then input @(anyalpha(_infile_)) name $varying40. fwidth caste_id dob :ddmmyy10.;
format dob ddmmyy10.;
drop fwidth position length;
run;
/* this will get you close, and I challenge you to get it squeaky clean*//*specify the right path in the infile statement*/
Below will only work if you haven't got any missing values in the last 3 columns.
data sample(drop=_:);
infile datalines dlm=' ' col=col truncover;
input var1 :$3. var2 :$10. @;
/* Sample 41355: Find the nth delimiter in a character string */
/* http://support.sas.com/kb/41/355.html */
call scan(strip(_infile_),-3,_pos,_len,' ');
_fwidth=_pos -col -1;
input var3 $varying60. _fwidth var4 :$1. var5 :ddmmyy10. var6 :$60.;
format var5 date9.;
datalines;
1 8001000004 BUDATI CHIRUHAS 6 07/07/1989 34XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
6 8001000138 BURRI SWAPNA 1 01/07/1990 34612578ABCDEFGHIJKLMNOPQRSTUVWYZ
50 8001000562 SUGALI ARUN KUMAR NAIK 2 18/10/1994 345762FABCDEGHIJKLMNOPQRSTUVWYZ18
;
run;
It is not space delimited in the correct sense:
1 8001000004 S BUDATI CHIRUHA 6 07/07/1989 34XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
^ ^ ^ ^ ^ ^
I indicate where spaces appear in the text, as you can see to import a space delimited would mean S and BUDATI, and CHIRUHA all go to different variables as that also contains spaces. Its quite a bad file really, anything using a good standard, like Comma Separated Variable file, would make this import a simple problem. As you have accepted a poor file format, likely with very little standardisation or documentation, then you will have problems importing it.
You may be able to read the whole line in as one string and then process it as:
scan(1," ") to get first part, then substr from there to drop this
scan(2," ") to get second part, then substr from there to drop this
Next substr to findc(numbers)-1 to get name and then substr to drop this
Then scan each of the other two variables out
This is all caused by poor file format!
data mts(drop=file1 garb);
infile "C:\Documents and Settings\office\Desktop\New Text Document.txt";
input Ser_NO Roll_NO garb $43.;
file1=scan(garb,1,"/");
name=compress(file1," ","ka");
caste_no=scan(compress(file1," ","kd"),1);
dob=substr(garb,length(file1)-2,10);
/*if length(caste_no)=1 then output;else caste_no=0;*/
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.