BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
R_Win
Calcite | Level 5

Actually i have uploaded the raw data as i have copy pasted earlier now i have given in that file

Ksharp
Super User

I know you have uploaded a attachment.

What I mean is what output you need.

Your data are very messy. I do not know From which column started to which column end for an obsersation.

art297
Opal | Level 21

The three records included in the attachment contain 1 records that only has 20 variables and two records that has 21 variables.  Unlike your previous example "^" is used as the delimiter rather than "|".

In the first record the last (20th) variable was a date, in the second record the last (21st) variable contained the string "ok", and the last variable in the third record (21st) was a date.

Either you didn't send the actual data, the data is (like Ksharp suggested) too messed up to work with, and/or you haven't provided enough info so that anyone could have a clue as to what you want to achieve.

Tom
Super User Tom
Super User

Here is a program that works for the three observations that you posted in the attachment above. You might need to adjust the record and variable lengths for files with longer records.

%let infile='c:\downloads\Raw_Data.txt';

%let nvars=21;

%let dlm=^;

filename tmpfile2 temp;

data _null_;

  infile &infile lrecl=1000 end=eof length=inchar ;

  file tmpfile2 lrecl=2000 ;

  length outline $2000 inline $1000 ;

  length outchar 8;

  length nbar1 nbar2 8;

  retain outline;

  retain outchar 0  ;

  do i=1 by 1 until( eof or (nbar1+nbar2 >= &nvars)) ;

    input inline $varying1000. inchar;

    nbar1 = lengthn(compress(outline,"&dlm",'K'));

    nbar2 = lengthn(compress(inline,"&dlm",'K'));

    if nbar1 + nbar2 < &nvars then do;

       substr(outline,outchar+1)=inline;

       outchar+inchar;

    end;

  end;

  putlog 'Line ' _n_ 'has ' outchar 'characters.';

  put outline $varying2000. outchar;

  outline=inline; outchar=inchar;

run;

You will need to write a program that knows something about the actual variables to read the transformed text file.

Here is a program that just reads in the 21 variables as character strings and dumps them to the log.

data check ;

  infile tmpfile2 dlm="&dlm" dsd truncover ;

  length col1-col&nvars $200.;

  input col1-col&nvars;

  put (_N_ _all_) (=/);

run;

Here are the results.

_N_=1

col1=Is the time taken

col2=9865

col3=

col4=

col5=

col6=COOL

col7=dealt was good it is ok for us

col8=56

col9=ok

col10=AEFG

col11=

col12=13-aug-1999

col13=ok for it

col14=this is not prosable

col15=Ecoreco provides the fulsize reduction

col16=Registration link opens on  at 3.00 pm. Any registration attempts before the specified

col17=

col18=

col19=

col20=

col21=

_N_=2

col1=REQ1

col2=REQ2

col3=REQ3

col4=REQ4

col5=REQ5

col6=REQ6

col7=13-JUNE-1980

col8=REQ8

col9=REQ9

col10=REQ10

col11=REQ11

col12=13-MAR-1997

col13=REQ13

col14=REQ14

col15=REQ15

col16=WE AREOK

col17=REQ17

col18=REQ18

col19=REQ19

col20=12-FEB-2011

col21=UP

_N_=3

col1=REQ1

col2=REQ2

col3=REQ3

col4=REQ4

col5=REQ5

col6=REQ6

col7=13-JUNE-1980

col8=REQ8

col9=REQ9

col10=REQ10

col11=REQ11

col12=13-MAR-1997

col13=REQ13

col14=REQ14

col15=REQ15

col16=WE AREOK

col17=REQ17

col18=REQ18

col19=REQ19

col20=REQ20

col21=12-FEB-2011

NOTE: 3 records were read from the infile TMPFILE2.

      The minimum record length was 140.

      The maximum record length was 256.

      One or more lines were truncated.

NOTE: The data set WORK.CHECK has 3 observations and 21 variables.



R_Win
Calcite | Level 5

Hi Tom actually my data structure was having huge length if i tryed on my own data it was not working i have attached my data structre can u help me..

Ksharp
Super User

If your output is like Tom's.

data want(keep=temp);
 infile 'c:\Raw_Data.txt' eof=last;
 length temp _temp $ 2000;
 retain temp _temp;
 input;
 temp=cats(temp,_infile_);
 if countc(temp,'^')ge 21 then do;temp=_temp;output;temp=_infile_;end;
 _temp=temp;
 return;
 last: output;
run;


Ksharp

Ksharp
Super User
data want(keep=temp);
 infile 'c:\Raw_Data.txt' eof=last;
 length temp _temp $ 2000;
 retain temp _temp;
 input;
 temp=cats(temp,_infile_);
 if countc(temp,'^')ge 21 then do;temp=_temp;output;temp=_infile_;end;
 _temp=temp;
 return;
 last: output;
run;
data want(keep=var:);
 set want;
 array _v{21} $ 100 var1-var21;
 do i=1 to 21;
  _v{i}=scan(temp,i,'^','m');
 end;
run;

Ksharp

R_Win
Calcite | Level 5

data want(keep=temp); infile 'c:\Raw_Data.txt' eof=last; length temp _temp $ 2000; retain temp _temp; input; temp=cats(temp,_infile_); if countc(temp,'^')ge 21 then do; temp=_temp; output; temp=_infile_; end; _temp=temp; return; last: output; Hi ksharp i have used the above code when i run to this the length of temp and _temp was only 256 it was showing ,i have checked by length function so the data is missing after that although it was 2000 for both temp and _temp why ?

Tom
Super User Tom
Super User

You need to specify lengths for your data file records and your variables.

Add LRECL option to your INFILE statement.

Increase the length of the TEMP _TEMP variables.

If you have 21 variables @ 4000 characters per variable then you might need set the length to 85000.  That would work for LRECL option, but is too long for a data step variable.  Most likely any given record will not have maximum length for each variable so you could try setting length of the temp strings to 32767.

Let me emphasize again that the real issue here is the process that is generating the file.  If you have any control over that then you can eliminate this problem by fixing the way that the data file is written.

Ksharp
Super User

I agree with Tom. You should add lrecl=32767 to adjust logic record length.

I am stunned that you have 21 variables and each variable has 2000-4000 length. That is really horrible.

I do not think you have used them fully.

From your posted data, there is not any sign to show your variable has 2000 length.

data want(keep=temp);
 infile 'c:\Raw_Data.txt' eof=last lrecl=32767;
 length temp _temp $ 32767;
 retain temp _temp;
 input;
 temp=compbl(cats(temp,_infile_));
 if countc(temp,'^')ge 21 then do;temp=_temp;output;temp=compbl(_infile_);end;
 _temp=temp;
 return;
 last: output;
run;
data want(keep=var:);
 set want;
 array _v{21} $ 2000 var1-var21;
 do i=1 to 21;
  _v{i}=scan(temp,i,'^','m');
 end;
run;



Ksharp

sas_Forum
Calcite | Level 5

Reg: Comma and Quotes

data want(keep=temp);

infile 'c:\Raw_Data.txt' eof=last lrecl=32767;

length temp _temp $ 32767;

retain temp _temp;

input;

temp=compbl(cats(temp,_infile_));

if countc(temp,'",')ge 10 then do;temp=_temp;output;temp=compbl(_infile_);end;

_temp=temp;

return;

last: output;

run;

data want3(keep=var:);
set want3;
array _v{10} $ 800 var1-var10;
do i=1 to 10;
  _v{i}=scan(temp,i,'",','m');
end;
run;


Used this code but i am having another new data having codes and comma as delimiter with 10- variables

and the data is having another problem that data is moving in to 2-3 lineslines  in the exampe given line2,3 has the data for
10 variables only but moving to another line.


"acv","1000036513","","Te_ADDR","507 Main, PLAZA, BUDH MARG","","HYd","","","IN"

"acv","1000036513","","Te_ADDR",
"507 Main, PLAZA, BUDH MARG","","HYd","","","IN"

"acv","1000036513","","Te_ADDR","507 Main, PLAZA, BUDH MARG","","HYd","","","IN"

"acv","1000

036513","","Te_ADDR","507 Main, PLAZA, BUDH MARG","","HYd","","","IN"

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 25 replies
  • 2893 views
  • 6 likes
  • 5 in conversation