DATA Step, Macro, Functions and more

how to make data set

Reply
Contributor
Posts: 56

how to make data set

1 8001000004 S BUDATI CHIRUHA 6 07/07/1989 34XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
2 8001000081 MATANGI SAI SUNDAR KANTH 1 27/06/1995 3456721ZYWFGHMLJIACDEBQSPUVTRKON8
3 8001000102 KOLUSU VEERA VENKATA VAMSI KRI 6 05/08/1994 6L4357Z2YWKMJN81DEAQOPRSTUVBCFGHI
4 8001000106 INTURI SRINIVAS 27/06/1993 346725LWNYZ18MKIFGHCDEBAJOPQRSTUV
5 8001000134 BHUKYA VENKATESWARLU NAIK 2 08/04/1995 3456XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
6 8001000138 BURRI SWAPNA 1 01/07/1990 34612578ABCDEFGHIJKLMNOPQRSTUVWYZ

... i have like this text data 2000 observation how to separate

ser_no=1

reg_Id=8001000004

Name=S BUDATI CHIRUHA

Caste_Id=6

Dob=07/07/1989

i dont need remaing extra data like 34568xxxxxxxxxxxx

i am attached txt file below

 

thank you help me

 

SAS Super FREQ
Posts: 9,324

Re: how to make data set

Posted in reply to rvsidhu035
Hi: What code have you tried so far. You will need a DATA step program with an INFILE statement. How you construct your INPUT statement will be part of the challenge. It is too bad that you did not receive delimited file, like a tab delimited file or a comma delimited file, that would make your code much more simple. Is there a maximum number of name fields that you can have? That would be good information to know.

Cynthia
Contributor
Posts: 56

Re: how to make data set

Posted in reply to Cynthia_sas
It's spaced dlm
PROC Star
Posts: 1,582

Re: how to make data set

[ Edited ]
Posted in reply to rvsidhu035

data have; infile 'Your New Text Document.txt'; input ser_no reg_id : $20. @; call scan(_infile_, -3, position, length,' '); fwidth=position-anyalpha(_infile_); if length=1 then input @(anyalpha(_infile_)) name $varying40. fwidth caste_id dob :ddmmyy10.; format dob ddmmyy10.; drop fwidth position length; run;
/* this will get you close, and I challenge you to get it squeaky clean*//*specify the right path in the infile statement*/

 

Respected Advisor
Posts: 4,674

Re: how to make data set

Posted in reply to rvsidhu035

Below will only work if you haven't got any missing values in the last 3 columns.

data sample(drop=_:);
  infile datalines dlm=' ' col=col truncover;

  input var1 :$3. var2 :$10. @;

  /* Sample 41355: Find the nth delimiter in a character string */
  /* http://support.sas.com/kb/41/355.html */
  call scan(strip(_infile_),-3,_pos,_len,' ');

  _fwidth=_pos -col -1; 
  input var3 $varying60. _fwidth  var4 :$1. var5 :ddmmyy10. var6 :$60.;
  format var5 date9.;


datalines;
1 8001000004 BUDATI CHIRUHAS 6 07/07/1989 34XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
6 8001000138 BURRI SWAPNA 1 01/07/1990 34612578ABCDEFGHIJKLMNOPQRSTUVWYZ
50 8001000562 SUGALI ARUN KUMAR NAIK 2 18/10/1994 345762FABCDEGHIJKLMNOPQRSTUVWYZ18
;
run;
Super User
Super User
Posts: 9,416

Re: how to make data set

Posted in reply to rvsidhu035

It is not space delimited in the correct sense:

1 8001000004 S BUDATI CHIRUHA 6 07/07/1989 34XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

 ^                    ^ ^             ^                ^                    ^

I indicate where spaces appear in the text, as you can see to import a space delimited would mean S and BUDATI, and CHIRUHA all go to different variables as that also contains spaces.  Its quite a bad file really, anything using a good standard, like Comma Separated Variable file, would make this import a simple problem.  As you have accepted a poor file format, likely with very little standardisation or documentation, then you will have problems importing it.  

 

You may be able to read the whole line in as one string and then process it as:

scan(1," ") to get first part, then substr from there to drop this

scan(2," ") to get second part, then substr from there to drop this

Next substr to findc(numbers)-1 to get name and then substr to drop this

Then scan each of the other two variables out

 

This is all caused by poor file format!

Contributor
Posts: 56

Re: how to make data set

Posted in reply to Cynthia_sas
data mts(drop=file1 garb);
infile "C:\Documents and Settings\office\Desktop\New Text Document.txt";
input Ser_NO Roll_NO garb $43.;
file1=scan(garb,1,"/");
name=compress(file1," ","ka");
caste_no=scan(compress(file1," ","kd"),1);
dob=substr(garb,length(file1)-2,10);
/*if length(caste_no)=1 then output;else caste_no=0;*/
run;
Ask a Question
Discussion stats
  • 6 replies
  • 192 views
  • 0 likes
  • 5 in conversation