i am getting data like this
kumar|raj|67|23-a
ug-90
muni|kunmar|90|12-j
an-98
mani|raju|78
|12-feb-
78
raju|k|34|10-feb-92
output
Name surname age date
kumar raj 67 23-aug-90
muni kunmar 90 12-jan-98
mani raju 78 12-feb-78
raju k 34 10-feb-92
I agree with Tom. You should add lrecl=32767 to adjust logic record length.
I am stunned that you have 21 variables and each variable has 2000-4000 length. That is really horrible.
I do not think you have used them fully.
From your posted data, there is not any sign to show your variable has 2000 length.
data want(keep=temp); infile 'c:\Raw_Data.txt' eof=last lrecl=32767; length temp _temp $ 32767; retain temp _temp; input; temp=compbl(cats(temp,_infile_)); if countc(temp,'^')ge 21 then do;temp=_temp;output;temp=compbl(_infile_);end; _temp=temp; return; last: output; run; data want(keep=var:); set want; array _v{21} $ 2000 var1-var21; do i=1 to 21; _v{i}=scan(temp,i,'^','m'); end; run;
Ksharp
Oh.God. Why didn't you show us the true data structure?
data want(keep=name user no date ); infile 'c:\x.txt' lrecl=2000 pad ; length name user no date temp $ 40 b $ 1; input a $char1. @@; retain temp name user no b; if a ne '|' then temp=cats(temp,a); if a='|' or (anydigit(b) and missing(a) and mod(count,4)=3) then do; count+1; select(mod(count,4)); when (1) name=temp; when (2) user=temp; when (3) no=temp; when (0) do; date=temp;output;end; end; call missing(temp); end; b=a; run;
Ksharp
Ksharp is there any way to increase the performacne i have tryed it was working on sample data if on 80000 obs it may take huge time for 1000 records it has taken 100 secnds like that actually i am having 30 variables now
kumar sd|raj kl|67|23-a ug-90 muni t|kunmar h|90|12-j an-98 mani y|raju o|78 |12-feb- 78 raju t|k y|34|10-feb-92 output:output Name surname age date kumar sd raj k 67 23-aug-90 muni t kunmar h 90 12-jan-98 mani y raju o 78 12-feb-78 raju t k y 34 10-feb-92 Hi ksharp actually i have tryed your code it was workign if i am having spaces in between the obs it was eliminating the observations can u help me inthis,thqs in advance.
Oh.God. Your data is very messy. But you did post what output you need, it is very hard to judge something belonged to which observation.
data want(keep=temp ); infile datalines ; length temp $ 2000; input a $char1. @@; retain temp ; temp=strip(temp)||a; pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o'); if prxmatch(pid,scan(temp,-1,'|')) then do; output;call missing(temp);end; datalines; kumar sd|raj kl|67|23-a ug-90 muni t|kunmar h|90|12-j an-98 mani y|raju o|78 |12-feb- 78 raju t|k y|34|10-feb-92 output:output Name surname age date kumar sd raj k 67 23-aug-90 muni t kunmar h 90 12-jan-98 mani y raju o 78 12-feb-78 raju t k y 34 10-feb-92 ; run;
Ksharp
Sharp sorry i have send the data my txt file actually if i used the code it was working but removing the spaces between the data can u see the file thqs for your patience
OK. you want to keep these blank too?
data want(keep=temp ); infile 'c:\x.txt' ; length temp $ 400 b $ 1; input a $char1. @@; retain temp b; if a='|' then count+1; if missing(b) and count le 2 then temp=catx(' ',temp,a); else temp=cats(temp,a); pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o'); if prxmatch(pid,scan(temp,-1,'|')) then do; output; count=0; call missing(temp); end; b=a; run; data want(keep=temp ); infile datalines ; length temp $ 400 b $ 1; input a $char1. @@; retain temp b; if a='|' then count+1; if missing(b) and count le 2 then temp=catx(' ',temp,a); else temp=cats(temp,a); pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o'); if prxmatch(pid,scan(temp,-1,'|')) then do; output; count=0; call missing(temp); end; b=a; datalines; kumar raj|sys engineer |45 |24-jan- 89 nani kumar|main frames user inn|56|1 2-feb-90 raju jr|di user| 89|23-mar-90 ; run;
Ksharp
Reg:It was Removing spaces between the observation Ksharp i am using the data bewlow and the code below but it was removing the spaces between the observaions any help. Data: ku mar|raj kl|67|23-a ug-90 muni sdp|kunmar|90|12-j an-98 mani|raju varma|78 |12-feb- 78 ra ju|k|34|10-feb-92 data want(keep=name user no date ); infile 'c:\x.txt' lrecl=2000 pad ; length name user no date temp $ 40 b $ 1; input a $char1. @@; retain temp name user no b; if a ne '|' then temp=cats(temp,a); if a='|' or (anydigit(b) and missing(a) and mod(count,4)=3) then do; count+1; select(mod(count,4)); when (1) name=temp; when (2) user=temp; when (3) no=temp; when (0) do; date=temp;output;end; end; call missing(temp); end; b=a; run;
Reg:It was Removing spaces between the observation Ksharp i am using the data bewlow and the code below but it was removing the spaces between the observaions any help. Please Refer the spaces.txt
Did you try the code I gave you above?
data want(keep=temp ); infile 'c:\x.txt' ; length temp $ 400 b $ 1; input a $char1. @@; retain temp b; if a='|' then count+1; if missing(b) and count le 2 then temp=catx(' ',temp,a); else temp=cats(temp,a); pid=prxparse('/\d{2,}-\w{3,}-\d{2,}/o'); if prxmatch(pid,temp) then do; output; count=0; call missing(temp); end; b=a; run;
Ksharp
I think you need to post some true data to let me to see the actual data structure of yours.
The way to promote process is to decrease the value of lrecl=.
data want(keep=temp); infile 'c:\x.txt' ; length temp $ 400; input a $char1. @@; retain temp ; temp=cats(temp,a); pid=prxparse('/\d{2}-\w{3}-\d{2}/'); if prxmatch(pid,scan(temp,-1,'|')) then do; output;call missing(temp);end; run;
Ksharp
K sharp i am sending the data below actually i am having 21 variables like c1 --c21 and the data is coming like this,but i have tryed the code of your but it was removing the spaces in between the data and taking time and some variables are 4000 in long.
bewlo is the sample data
===========================
Is the time taken
^9865^^^^COOL^dealt was good
it is ok for us^56^ok^AEFG^ ^13-aug-19
99^ok for it ^this is not prosable ^ Ecoreco provides the fulsize reduction
^ Registration link opens on at 3.00 pm. Any registration attempts before the specified time will not be considered in the
^To illustrate, suppose that a very large file named “Master” contains a column named “Code”^^^^23-jAN-1
980
REQ1^REQ2^REQ3^REQ4^REQ5^REQ6^13-JUNE-19
80^REQ8^REQ9
^REQ10^REQ11^13-MAR-19
97^REQ13^REQ14^REQ15^WE ARE
OK ^REQ17 ^REQ18 ^REQ19 ^12-FEB-2
011^UP
REQ1^REQ2^REQ3^REQ4^REQ5^REQ6^13-JUNE-1980^REQ8^REQ9^REQ10^REQ11^13-MAR-1997^REQ13^REQ14^REQ15^WE AREOK ^REQ17 ^REQ18 ^REQ19 ^REQ20^12-FEB-2
011
==============
If you click on the "Use advanced editor" link in the upper right of the input form you will get a page that will let you upload a file.
The problem with this format is finding the end of record.
Do you have any control over the generation of this file?
Are the extra line breaks caused by field values that contain CR or LF characters? One solution is to replace those characters in the source system with other characters so that they will not be in the data file.
Hi you can find the data for that
Now what do you want output to look like?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.