## DATA Step, Macro, Functions and more

Solved
Regular Contributor
Posts: 229

i am getting data like this

kumar|raj|67|23-a
ug-90

muni|kunmar|90|12-j
an-98

mani|raju|78
|12-feb-
78

raju|k|34|10-feb-92

output

Name    surname age date

kumar   raj         67  23-aug-90
muni    kunmar  90  12-jan-98
mani    raju        78  12-feb-78

raju       k          34   10-feb-92

Accepted Solutions
Solution
‎12-06-2011 09:14 PM
Super User
Posts: 10,770

I agree with Tom. You should add lrecl=32767 to adjust logic record length.

I am stunned that you have 21 variables and each variable has 2000-4000 length. That is really horrible.

I do not think you have used them fully.

From your posted data, there is not any sign to show your variable has 2000 length.

```data want(keep=temp);
infile 'c:\Raw_Data.txt' eof=last lrecl=32767;
length temp _temp \$ 32767;
retain temp _temp;
input;
temp=compbl(cats(temp,_infile_));
if countc(temp,'^')ge 21 then do;temp=_temp;output;temp=compbl(_infile_);end;
_temp=temp;
return;
last: output;
run;
data want(keep=var:);
set want;
array _v{21} \$ 2000 var1-var21;
do i=1 to 21;
_v{i}=scan(temp,i,'^','m');
end;
run;

```

Ksharp

All Replies
Super User
Posts: 10,770

Oh.God. Why didn't you show us the true data structure?

```data want(keep=name user no date );
length name user no date temp \$ 40 b \$ 1;
input a \$char1. @@;
retain temp name user no b;
if a ne '|' then temp=cats(temp,a);
if a='|'  or (anydigit(b) and missing(a) and mod(count,4)=3) then do;
count+1;
select(mod(count,4));
when (1) name=temp;
when (2) user=temp;
when (3) no=temp;
when (0) do; date=temp;output;end;
end;
call missing(temp);
end;
b=a;
run;

```

Ksharp

Regular Contributor
Posts: 229

Ksharp is there any way to increase the performacne i have tryed it was working on sample data if on 80000 obs it may take huge time for 1000 records it has taken 100 secnds like that actually i am having 30 variables now

Regular Contributor
Posts: 229

kumar  sd|raj  kl|67|23-a ug-90 muni  t|kunmar  h|90|12-j an-98 mani  y|raju  o|78 |12-feb- 78 raju  t|k  y|34|10-feb-92 outpututput Name      surname  age  date kumar sd  raj k  67  23-aug-90 muni t    kunmar h 90  12-jan-98 mani y  raju o  78  12-feb-78 raju t  k y        34  10-feb-92 Hi ksharp actually i have tryed your code it was workign if i am having spaces in between the obs it was eliminating the observations can u help me inthis,thqs in advance.

Super User
Posts: 10,770

Oh.God. Your data is very messy. But you did post what output you need, it is very hard to judge something belonged to which observation.

```data want(keep=temp );
infile datalines ;
length temp \$ 2000;
input a \$char1. @@;
retain temp ;
temp=strip(temp)||a;
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,scan(temp,-1,'|')) then do;
output;call missing(temp);end;
datalines;
kumar  sd|raj  kl|67|23-a ug-90 muni  t|kunmar  h|90|12-j an-98 mani  y|raju  o|78 |12-feb- 78 raju  t|k  y|34|10-feb-92 output:output Name      surname  age  date kumar sd  raj k  67  23-aug-90 muni t    kunmar h 90  12-jan-98 mani y  raju o  78  12-feb-78 raju t  k y        34  10-feb-92
;
run;

```

Ksharp

Regular Contributor
Posts: 229

Sharp sorry i have send the data my txt file actually if i used the code it was working but removing the spaces between the data can u see the file thqs for your patience

Super User
Posts: 10,770

OK. you want to keep these blank too?

```data want(keep=temp );
infile 'c:\x.txt' ;
length temp \$ 400 b \$ 1;
input a \$char1. @@;
retain temp b;
if a='|' then count+1;
if missing(b) and count le 2 then temp=catx(' ',temp,a);
else temp=cats(temp,a);
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,scan(temp,-1,'|')) then do;
output; count=0;
call missing(temp);
end;
b=a;
run;

data want(keep=temp );
infile datalines ;
length temp \$ 400 b \$ 1;
input a \$char1. @@;
retain temp b;
if a='|' then count+1;
if missing(b) and count le 2 then temp=catx(' ',temp,a);
else temp=cats(temp,a);
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,scan(temp,-1,'|')) then do;
output; count=0;
call missing(temp);
end;
b=a;
datalines;
kumar raj|sys engineer |45 |24-jan-
89

nani kumar|main frames user inn|56|1
2-feb-90

raju   jr|di user|
89|23-mar-90

;
run;

```

Ksharp

Regular Contributor
Posts: 229

Reg:It was Removing spaces between the observation Ksharp i am using the data bewlow and the code below but it was removing the spaces between the observaions any help. Data: ku mar|raj kl|67|23-a ug-90 muni sdp|kunmar|90|12-j an-98 mani|raju varma|78 |12-feb- 78 ra ju|k|34|10-feb-92 data want(keep=name user no date ); infile 'c:\x.txt' lrecl=2000 pad ; length name user no date temp \$ 40 b \$ 1; input a \$char1. @@; retain temp name user no b; if a ne '|' then temp=cats(temp,a); if a='|'  or (anydigit(b) and missing(a) and mod(count,4)=3) then do;                                       count+1;                                       select(mod(count,4));                                         when (1) name=temp;                                         when (2) user=temp;                                         when (3) no=temp;                                         when (0) do; date=temp;output;end;                                         end;                                         call missing(temp);                                         end; b=a; run;

Regular Contributor
Posts: 229

Reg:It was Removing spaces between the observation Ksharp i am using the data bewlow and the code below but it was removing the spaces between the observaions any help. Please Refer the spaces.txt

Super User
Posts: 10,770

Did you try the code I gave you above?

```data want(keep=temp );
infile 'c:\x.txt' ;
length temp \$ 400 b \$ 1;
input a \$char1. @@;
retain temp b;
if a='|' then count+1;
if missing(b) and count le 2 then temp=catx(' ',temp,a);
else temp=cats(temp,a);
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,temp) then do;
output; count=0;
call missing(temp);
end;
b=a;
run;

```

Ksharp

Super User
Posts: 10,770

I think you need to post some true data to let me to see the actual data structure of yours.

The way to promote process is to decrease the value of lrecl=.

```data want(keep=temp);
infile 'c:\x.txt' ;
length temp \$ 400;
input a \$char1. @@;
retain temp ;
temp=cats(temp,a);
pid=prxparse('/\d{2}-\w{3}-\d{2}/');
if prxmatch(pid,scan(temp,-1,'|')) then do;
output;call missing(temp);end;
run;

```

Ksharp

Regular Contributor
Posts: 229

K sharp i am sending the data below actually i am having 21 variables like c1 --c21 and the data is coming like this,but i have tryed the code  of your but it was removing the spaces in between the data and taking time and some variables are 4000 in long.

bewlo is the sample data

===========================

Is the time taken
^9865^^^^COOL^dealt was good
it is ok for us^56^ok^AEFG^         ^13-aug-19
99^ok for it ^this is not prosable ^ Ecoreco provides the fulsize reduction

^ Registration link opens on  at 3.00 pm. Any registration attempts before the specified time will not be considered in the
^To illustrate, suppose that a very large file named “Master” contains a column named “Code”^^^^23-jAN-1
980

REQ1^REQ2^REQ3^REQ4^REQ5^REQ6^13-JUNE-19
80^REQ8^REQ9

^REQ10^REQ11^13-MAR-19
97^REQ13^REQ14^REQ15^WE ARE
OK ^REQ17 ^REQ18 ^REQ19 ^12-FEB-2
011^UP

REQ1^REQ2^REQ3^REQ4^REQ5^REQ6^13-JUNE-1980^REQ8^REQ9^REQ10^REQ11^13-MAR-1997^REQ13^REQ14^REQ15^WE AREOK ^REQ17 ^REQ18 ^REQ19 ^REQ20^12-FEB-2
011

==============

Super User
Posts: 8,084

If you click on the "Use advanced editor" link in the upper right of the input form you will get a page that will let you upload a file.

The problem with this format is finding the end of record.

Do you have any control over the generation of this file?

Are the extra line breaks caused by field values that contain CR or LF characters?  One solution is to replace those characters in the source system with other characters so that they will not be in the data file.

Regular Contributor
Posts: 229

Hi you can find the data for that

Super User
Posts: 10,770