Re: how to make data set

rvsidhu035 · Posted 01-16-2018 11:50 PM

1 8001000004 S BUDATI CHIRUHA 6 07/07/1989 34XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
2 8001000081 MATANGI SAI SUNDAR KANTH 1 27/06/1995 3456721ZYWFGHMLJIACDEBQSPUVTRKON8
3 8001000102 KOLUSU VEERA VENKATA VAMSI KRI 6 05/08/1994 6L4357Z2YWKMJN81DEAQOPRSTUVBCFGHI
4 8001000106 INTURI SRINIVAS 27/06/1993 346725LWNYZ18MKIFGHCDEBAJOPQRSTUV
5 8001000134 BHUKYA VENKATESWARLU NAIK 2 08/04/1995 3456XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
6 8001000138 BURRI SWAPNA 1 01/07/1990 34612578ABCDEFGHIJKLMNOPQRSTUVWYZ

... i have like this text data 2000 observation how to separate

ser_no=1

reg_Id=8001000004

Name=S BUDATI CHIRUHA

Caste_Id=6

Dob=07/07/1989

i dont need remaing extra data like 34568xxxxxxxxxxxx

i am attached txt file below

thank you help me

Cynthia_sas · Posted 01-17-2018 12:30 AM

Hi: What code have you tried so far. You will need a DATA step program with an INFILE statement. How you construct your INPUT statement will be part of the challenge. It is too bad that you did not receive delimited file, like a tab delimited file or a comma delimited file, that would make your code much more simple. Is there a maximum number of name fields that you can have? That would be good information to know.

Cynthia

rvsidhu035 · Posted 01-17-2018 12:32 AM

It's spaced dlm

novinosrin · Posted 01-17-2018 01:04 AM


data have;
infile 'Your New Text Document.txt';
input ser_no reg_id : $20. @;
call scan(_infile_, -3, position, length,' ');
fwidth=position-anyalpha(_infile_);
if length=1 then input @(anyalpha(_infile_)) name $varying40. fwidth caste_id dob :ddmmyy10.;
format dob ddmmyy10.;
drop fwidth position length;
run;

/* this will get you close, and I challenge you to get it squeaky clean*//*specify the right path in the infile statement*/

Patrick · Posted 01-17-2018 01:22 AM

Below will only work if you haven't got any missing values in the last 3 columns.

data sample(drop=_:);
  infile datalines dlm=' ' col=col truncover;

  input var1 :$3. var2 :$10. @;

  /* Sample 41355: Find the nth delimiter in a character string */
  /* http://support.sas.com/kb/41/355.html */
  call scan(strip(_infile_),-3,_pos,_len,' ');

  _fwidth=_pos -col -1; 
  input var3 $varying60. _fwidth  var4 :$1. var5 :ddmmyy10. var6 :$60.;
  format var5 date9.;


datalines;
1 8001000004 BUDATI CHIRUHAS 6 07/07/1989 34XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
6 8001000138 BURRI SWAPNA 1 01/07/1990 34612578ABCDEFGHIJKLMNOPQRSTUVWYZ
50 8001000562 SUGALI ARUN KUMAR NAIK 2 18/10/1994 345762FABCDEGHIJKLMNOPQRSTUVWYZ18
;
run;

RW9 · Posted 01-17-2018 04:18 AM

It is not space delimited in the correct sense:

1 8001000004 S BUDATI CHIRUHA 6 07/07/1989 34XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

^ ^ ^ ^ ^ ^

I indicate where spaces appear in the text, as you can see to import a space delimited would mean S and BUDATI, and CHIRUHA all go to different variables as that also contains spaces. Its quite a bad file really, anything using a good standard, like Comma Separated Variable file, would make this import a simple problem. As you have accepted a poor file format, likely with very little standardisation or documentation, then you will have problems importing it.

You may be able to read the whole line in as one string and then process it as:

scan(1," ") to get first part, then substr from there to drop this

scan(2," ") to get second part, then substr from there to drop this

Next substr to findc(numbers)-1 to get name and then substr to drop this

Then scan each of the other two variables out

This is all caused by poor file format!

rvsidhu035 · Posted 01-17-2018 01:24 AM

data mts(drop=file1 garb);
infile "C:\Documents and Settings\office\Desktop\New Text Document.txt";
input Ser_NO Roll_NO garb $43.;
file1=scan(garb,1,"/");
name=compress(file1," ","ka");
caste_no=scan(compress(file1," ","kd"),1);
dob=substr(garb,length(file1)-2,10);
/*if length(caste_no)=1 then output;else caste_no=0;*/
run;

how to make data set