Solved: Reg:Data Loading Issue

R_Win · Posted 12-02-2011 04:34 AM

i am getting data like this

kumar|raj|67|23-a
ug-90

muni|kunmar|90|12-j
an-98

mani|raju|78
|12-feb-
78

raju|k|34|10-feb-92

output

Name surname age date

kumar   raj         67 23-aug-90
muni    kunmar 90 12-jan-98
mani    raju        78 12-feb-78

raju k 34 10-feb-92

Ksharp · Posted 12-06-2011 09:14 PM

I agree with Tom. You should add lrecl=32767 to adjust logic record length.

I am stunned that you have 21 variables and each variable has 2000-4000 length. That is really horrible.

I do not think you have used them fully.

From your posted data, there is not any sign to show your variable has 2000 length.

data want(keep=temp);
 infile 'c:\Raw_Data.txt' eof=last lrecl=32767;
 length temp _temp $ 32767;
 retain temp _temp;
 input;
 temp=compbl(cats(temp,_infile_));
 if countc(temp,'^')ge 21 then do;temp=_temp;output;temp=compbl(_infile_);end;
 _temp=temp;
 return;
 last: output;
run;
data want(keep=var:);
 set want;
 array _v{21} $ 2000 var1-var21;
 do i=1 to 21;
  _v{i}=scan(temp,i,'^','m');
 end;
run;

Ksharp

View solution in original post

Ksharp · Posted 12-02-2011 04:58 AM

Oh.God. Why didn't you show us the true data structure?

data want(keep=name user no date );
infile 'c:\x.txt' lrecl=2000 pad ;
length name user no date temp $ 40 b $ 1;
input a $char1. @@;
retain temp name user no b; 
if a ne '|' then temp=cats(temp,a);
if a='|'  or (anydigit(b) and missing(a) and mod(count,4)=3) then do;
                                       count+1;
                                       select(mod(count,4));
                                         when (1) name=temp;
                                         when (2) user=temp;
                                         when (3) no=temp;
                                         when (0) do; date=temp;output;end;
                                        end;
                                        call missing(temp);
                                        end;
b=a;
run;

Ksharp

R_Win · Posted 12-02-2011 05:19 AM

Ksharp is there any way to increase the performacne i have tryed it was working on sample data if on 80000 obs it may take huge time for 1000 records it has taken 100 secnds like that actually i am having 30 variables now

R_Win · Posted 12-06-2011 01:04 AM

Ksharp · Posted 12-06-2011 01:31 AM

Oh.God. Your data is very messy. But you did post what output you need, it is very hard to judge something belonged to which observation.

data want(keep=temp );
infile datalines ;
length temp $ 2000;
input a $char1. @@;
retain temp ; 
temp=strip(temp)||a;
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,scan(temp,-1,'|')) then do;
output;call missing(temp);end;
datalines;
kumar  sd|raj  kl|67|23-a ug-90 muni  t|kunmar  h|90|12-j an-98 mani  y|raju  o|78 |12-feb- 78 raju  t|k  y|34|10-feb-92 output:output Name      surname  age  date kumar sd  raj k  67  23-aug-90 muni t    kunmar h 90  12-jan-98 mani y  raju o  78  12-feb-78 raju t  k y        34  10-feb-92 
;
run;

Ksharp

R_Win · Posted 12-06-2011 02:00 AM

Sharp sorry i have send the data my txt file actually if i used the code it was working but removing the spaces between the data can u see the file thqs for your patience

Ksharp · Posted 12-06-2011 02:39 AM

OK. you want to keep these blank too?

data want(keep=temp );
infile 'c:\x.txt' ;
length temp $ 400 b $ 1;
input a $char1. @@;
retain temp b;
if a='|' then count+1;
if missing(b) and count le 2 then temp=catx(' ',temp,a);
 else temp=cats(temp,a); 
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,scan(temp,-1,'|')) then do;
                         output; count=0;
                         call missing(temp);
                           end;
b=a;
run;













data want(keep=temp );
infile datalines ;
length temp $ 400 b $ 1;
input a $char1. @@;
retain temp b;
if a='|' then count+1;
if missing(b) and count le 2 then temp=catx(' ',temp,a);
 else temp=cats(temp,a); 
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,scan(temp,-1,'|')) then do;
                         output; count=0;
                         call missing(temp);
                           end;
b=a;
datalines;
kumar raj|sys engineer |45 |24-jan-
89

nani kumar|main frames user inn|56|1
2-feb-90

raju   jr|di user|
89|23-mar-90

;
run;

Ksharp

R_Win · Posted 12-08-2011 07:50 AM

Reg:It was Removing spaces between the observation Ksharp i am using the data bewlow and the code below but it was removing the spaces between the observaions any help. Data: ku mar|raj kl|67|23-a ug-90 muni sdp|kunmar|90|12-j an-98 mani|raju varma|78 |12-feb- 78 ra ju|k|34|10-feb-92 data want(keep=name user no date ); infile 'c:\x.txt' lrecl=2000 pad ; length name user no date temp $ 40 b $ 1; input a $char1. @@; retain temp name user no b; if a ne '|' then temp=cats(temp,a); if a='|' or (anydigit(b) and missing(a) and mod(count,4)=3) then do; count+1; select(mod(count,4)); when (1) name=temp; when (2) user=temp; when (3) no=temp; when (0) do; date=temp;output;end; end; call missing(temp); end; b=a; run;

R_Win · Posted 12-08-2011 07:52 AM

Reg:It was Removing spaces between the observation Ksharp i am using the data bewlow and the code below but it was removing the spaces between the observaions any help. Please Refer the spaces.txt

Ksharp · Posted 12-08-2011 09:56 PM

Did you try the code I gave you above?

data want(keep=temp );
infile 'c:\x.txt' ;
length temp $ 400 b $ 1;
input a $char1. @@;
retain temp b;
if a='|' then count+1;
if missing(b) and count le 2 then temp=catx(' ',temp,a);
 else temp=cats(temp,a); 
pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o');
if prxmatch(pid,temp) then do;
                         output; count=0;
                         call missing(temp);
                           end;
b=a;
run;

Ksharp

Ksharp · Posted 12-02-2011 08:02 AM

I think you need to post some true data to let me to see the actual data structure of yours.

The way to promote process is to decrease the value of lrecl=.

data want(keep=temp);
infile 'c:\x.txt' ;
length temp $ 400;
input a $char1. @@;
retain temp ; 
temp=cats(temp,a);
pid=prxparse('/\d{2}-\w{3}-\d{2}/');
if prxmatch(pid,scan(temp,-1,'|')) then do;
output;call missing(temp);end;
run;

Ksharp

R_Win · Posted 12-02-2011 08:38 AM

K sharp i am sending the data below actually i am having 21 variables like c1 --c21 and the data is coming like this,but i have tryed the code of your but it was removing the spaces in between the data and taking time and some variables are 4000 in long.

bewlo is the sample data

===========================

Is the time taken
^9865^^^^COOL^dealt was good
it is ok for us^56^ok^AEFG^ ^13-aug-19
99^ok for it ^this is not prosable ^ Ecoreco provides the fulsize reduction

^ Registration link opens on at 3.00 pm. Any registration attempts before the specified time will not be considered in the
^To illustrate, suppose that a very large file named “Master” contains a column named “Code”^^^^23-jAN-1
980

REQ1^REQ2^REQ3^REQ4^REQ5^REQ6^13-JUNE-19
80^REQ8^REQ9

^REQ10^REQ11^13-MAR-19
97^REQ13^REQ14^REQ15^WE ARE
OK ^REQ17 ^REQ18 ^REQ19 ^12-FEB-2
011^UP

REQ1^REQ2^REQ3^REQ4^REQ5^REQ6^13-JUNE-1980^REQ8^REQ9^REQ10^REQ11^13-MAR-1997^REQ13^REQ14^REQ15^WE AREOK ^REQ17 ^REQ18 ^REQ19 ^REQ20^12-FEB-2
011

==============

Tom · Posted 12-02-2011 08:48 AM

If you click on the "Use advanced editor" link in the upper right of the input form you will get a page that will let you upload a file.

The problem with this format is finding the end of record.

Do you have any control over the generation of this file?

Are the extra line breaks caused by field values that contain CR or LF characters? One solution is to replace those characters in the source system with other characters so that they will not be in the data file.

R_Win · Posted 12-02-2011 09:20 AM

Hi you can find the data for that

Ksharp · Posted 12-02-2011 09:35 AM

Now what do you want output to look like?

Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Reg:Data Loading Issue

Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Re: Reg:Data Loading Issue

Registration is open

SAS Training: Just a Click Away