BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
R_Win
Calcite | Level 5

i am getting data like this

kumar|raj|67|23-a
ug-90


muni|kunmar|90|12-j
an-98

mani|raju|78
|12-feb-
78

raju|k|34|10-feb-92

output

Name    surname age date

kumar   raj         67  23-aug-90
muni    kunmar  90  12-jan-98
mani    raju        78  12-feb-78

raju       k          34   10-feb-92

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

I agree with Tom. You should add lrecl=32767 to adjust logic record length.

I am stunned that you have 21 variables and each variable has 2000-4000 length. That is really horrible.

I do not think you have used them fully.

From your posted data, there is not any sign to show your variable has 2000 length.

data want(keep=temp);
 infile 'c:\Raw_Data.txt' eof=last lrecl=32767;
 length temp _temp $ 32767;
 retain temp _temp;
 input;
 temp=compbl(cats(temp,_infile_));
 if countc(temp,'^')ge 21 then do;temp=_temp;output;temp=compbl(_infile_);end;
 _temp=temp;
 return;
 last: output;
run;
data want(keep=var:);
 set want;
 array _v{21} $ 2000 var1-var21;
 do i=1 to 21;
  _v{i}=scan(temp,i,'^','m');
 end;
run;



Ksharp

View solution in original post

25 REPLIES 25
Ksharp
Super User

Oh.God. Why didn't you show us the true data structure?

data want(keep=name user no date );
infile 'c:\x.txt' lrecl=2000 pad ;
length name user no date temp $ 40 b $ 1;
input a $char1. @@;
retain temp name user no b; 
if a ne '|' then temp=cats(temp,a);
if a='|'  or (anydigit(b) and missing(a) and mod(count,4)=3) then do;
                                       count+1;
                                       select(mod(count,4));
                                         when (1) name=temp;
                                         when (2) user=temp;
                                         when (3) no=temp;
                                         when (0) do; date=temp;output;end;
                                        end;
                                        call missing(temp);
                                        end;
b=a;
run;


Ksharp

R_Win
Calcite | Level 5

Ksharp is there any way to increase the performacne i have tryed it was working on sample data if on 80000 obs it may take huge time for 1000 records it has taken 100 secnds like that actually i am having 30 variables now

R_Win
Calcite | Level 5

kumar  sd|raj  kl|67|23-a ug-90 muni  t|kunmar  h|90|12-j an-98 mani  y|raju  o|78 |12-feb- 78 raju  t|k  y|34|10-feb-92 output:output Name      surname  age  date kumar sd  raj k  67  23-aug-90 muni t    kunmar h 90  12-jan-98 mani y  raju o  78  12-feb-78 raju t  k y        34  10-feb-92 Hi ksharp actually i have tryed your code it was workign if i am having spaces in between the obs it was eliminating the observations can u help me inthis,thqs in advance.

Ksharp
Super User

Oh.God. Your data is very messy. But you did post what output you need, it is very hard to judge something belonged to which observation.

data want(keep=temp );
infile datalines ;
length temp $ 2000;
input a $char1. @@;
retain temp ; 
temp=strip(temp)||a;
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,scan(temp,-1,'|')) then do;
output;call missing(temp);end;
datalines;
kumar  sd|raj  kl|67|23-a ug-90 muni  t|kunmar  h|90|12-j an-98 mani  y|raju  o|78 |12-feb- 78 raju  t|k  y|34|10-feb-92 output:output Name      surname  age  date kumar sd  raj k  67  23-aug-90 muni t    kunmar h 90  12-jan-98 mani y  raju o  78  12-feb-78 raju t  k y        34  10-feb-92 
;
run;


Ksharp

R_Win
Calcite | Level 5

Sharp sorry i have send the data my txt file actually if i used the code it was working but removing the spaces between the data can u see the file thqs for your patience

Ksharp
Super User

OK. you want to keep these blank too?

data want(keep=temp );
infile 'c:\x.txt' ;
length temp $ 400 b $ 1;
input a $char1. @@;
retain temp b;
if a='|' then count+1;
if missing(b) and count le 2 then temp=catx(' ',temp,a);
 else temp=cats(temp,a); 
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,scan(temp,-1,'|')) then do;
                         output; count=0;
                         call missing(temp);
                           end;
b=a;
run;













data want(keep=temp );
infile datalines ;
length temp $ 400 b $ 1;
input a $char1. @@;
retain temp b;
if a='|' then count+1;
if missing(b) and count le 2 then temp=catx(' ',temp,a);
 else temp=cats(temp,a); 
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,scan(temp,-1,'|')) then do;
                         output; count=0;
                         call missing(temp);
                           end;
b=a;
datalines;
kumar raj|sys engineer |45 |24-jan-
89

nani kumar|main frames user inn|56|1
2-feb-90

raju   jr|di user|
89|23-mar-90

;
run;

Ksharp

R_Win
Calcite | Level 5

Reg:It was Removing spaces between the observation Ksharp i am using the data bewlow and the code below but it was removing the spaces between the observaions any help. Data: ku mar|raj kl|67|23-a ug-90 muni sdp|kunmar|90|12-j an-98 mani|raju varma|78 |12-feb- 78 ra ju|k|34|10-feb-92 data want(keep=name user no date ); infile 'c:\x.txt' lrecl=2000 pad ; length name user no date temp $ 40 b $ 1; input a $char1. @@; retain temp name user no b; if a ne '|' then temp=cats(temp,a); if a='|'  or (anydigit(b) and missing(a) and mod(count,4)=3) then do;                                       count+1;                                       select(mod(count,4));                                         when (1) name=temp;                                         when (2) user=temp;                                         when (3) no=temp;                                         when (0) do; date=temp;output;end;                                         end;                                         call missing(temp);                                         end; b=a; run;

R_Win
Calcite | Level 5

Reg:It was Removing spaces between the observation Ksharp i am using the data bewlow and the code below but it was removing the spaces between the observaions any help. Please Refer the spaces.txt

Ksharp
Super User

Did you try the code I gave you above?

data want(keep=temp );
infile 'c:\x.txt' ;
length temp $ 400 b $ 1;
input a $char1. @@;
retain temp b;
if a='|' then count+1;
if missing(b) and count le 2 then temp=catx(' ',temp,a);
 else temp=cats(temp,a); 
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,temp) then do;
                         output; count=0;
                         call missing(temp);
                           end;
b=a;
run;




Ksharp

Ksharp
Super User

I think you need to post some true data to let me to see the actual data structure of yours.

The way to promote process is to decrease the value of lrecl=.

data want(keep=temp);
infile 'c:\x.txt' ;
length temp $ 400;
input a $char1. @@;
retain temp ; 
temp=cats(temp,a);
pid=prxparse('/\d{2}-\w{3}-\d{2}/');
if prxmatch(pid,scan(temp,-1,'|')) then do;
output;call missing(temp);end;
run;


Ksharp

R_Win
Calcite | Level 5

K sharp i am sending the data below actually i am having 21 variables like c1 --c21 and the data is coming like this,but i have tryed the code  of your but it was removing the spaces in between the data and taking time and some variables are 4000 in long.

bewlo is the sample data

===========================

Is the time taken
^9865^^^^COOL^dealt was good
it is ok for us^56^ok^AEFG^         ^13-aug-19
99^ok for it ^this is not prosable ^ Ecoreco provides the fulsize reduction


^ Registration link opens on  at 3.00 pm. Any registration attempts before the specified time will not be considered in the
^To illustrate, suppose that a very large file named “Master” contains a column named “Code”^^^^23-jAN-1
980

REQ1^REQ2^REQ3^REQ4^REQ5^REQ6^13-JUNE-19
80^REQ8^REQ9

^REQ10^REQ11^13-MAR-19
97^REQ13^REQ14^REQ15^WE ARE
OK ^REQ17 ^REQ18 ^REQ19 ^12-FEB-2
011^UP


REQ1^REQ2^REQ3^REQ4^REQ5^REQ6^13-JUNE-1980^REQ8^REQ9^REQ10^REQ11^13-MAR-1997^REQ13^REQ14^REQ15^WE AREOK ^REQ17 ^REQ18 ^REQ19 ^REQ20^12-FEB-2
011

==============

Tom
Super User Tom
Super User

If you click on the "Use advanced editor" link in the upper right of the input form you will get a page that will let you upload a file.

The problem with this format is finding the end of record.

Do you have any control over the generation of this file? 

Are the extra line breaks caused by field values that contain CR or LF characters?  One solution is to replace those characters in the source system with other characters so that they will not be in the data file.

R_Win
Calcite | Level 5

Hi you can find the data for that

Ksharp
Super User

Now what do you want output to look like?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 25 replies
  • 2893 views
  • 6 likes
  • 5 in conversation