BookmarkSubscribeRSS Feed
athapa1183
Obsidian | Level 7

The data is very long and just a single row. I just have 4 segments from the long line.

 

CLM*178*322.56***12+B+1*Y*A*Y*Y**NTE*19*94289*WER~CLM*123 *322.56***11+A+1*Y*A*Y*Y* *90289*RRW~CLM*223 *322.56***11+A+1*Y*A*Y*Y* NTE*11*93289*RRW~CLM*120 *322.56***11+A+1*Y*A*Y*Y* NTE*17*91289*SEW~

 

I want 

 

CLMNTE
17819
123Blank
22311
12017

 

 

I have almost 15,000 CLM in the long line.

Sometimes the NTE segment is itself missing and it messes up my output.

 

Here is my code :

 

data a_111;
infile 'E:\Users\Documents and Settings\57260\Desktop\test\.W.160712102723.086.dat' lrecl = 1000000000 ;
input @'CLM*' string $3. @'NTE*' COS $2.  @@;
run;

how to account for when NTE segment is missing.

 I have attached a small sample of data

                                     

I will appreciate your help in this regard.                                                                                                               

22 REPLIES 22
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep!!

 

 

As such I am once again guessing:

data want;
  set have;
  do i=1 to countw(your_string,"CLM");
    new_line=scan(your_string,i,"CLM");
clm=scan(new_line,1,"*"); output; end; run;

You can then use the same sort of scan() function to split out parts of the remaining string.

athapa1183
Obsidian | Level 7

THis will not take care of missing NTE Segment.

 

Thank YOu

RW9
Diamond | Level 26 RW9
Diamond | Level 26

And this is of course why my final line was:

"You can then use the same sort of scan() function to split out parts of the remaining string."

 

You can work out how to do that part from the code I have provided?

athapa1183
Obsidian | Level 7

Thank YOu RW9. 

 

the file is a dat file with only 1 long single line. Not a dataset yet.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You need to know what the file structure is then.  I assume that it is * delimited, so that would be step one:

data want;
  infile "...dat" dlm="*" missover dsd;
  input ...;
run;

From there you would add to the input statement the structure of the file you are reading.  In this way you would then end up with the elements in variables. 

This: lrecl = 1000000000

Is not going to work however:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003160472.htm

There is a maximum to the length.  It would be simpler to get the file with line breaks (perhaps it already does, maybe you need to change the termstr:

http://support.sas.com/kb/14/178.html

I.e. did the file come from unix and you are using on windows perhaps or vice versa, you can use notepad++ or one of the text editors to look and see).

athapa1183
Obsidian | Level 7

I tried all of it. Does not work with @@. 

dlm="*" missover dsd

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

No, you do not use @@ with that line.  The file you get, is it from unix, is it from windows.  These have different line endings, one has line feed + carriage return, the other has just carriage return.  From the documentation:

http://support.sas.com/kb/14/178.html

 

Something like:

  infile "...dat" dlm="*" missover dsd termstr=lf;

 It is very unlikely that your file is one long string.

 

If however it is, then return to source and get a better file.

 

The only other solution is to try to go down the route of reading it block at a time, then parsing out each individual element, which will be a real pain for you.  then you would need the @@:

data inter;
  infile "...dat";
  input long_string $ @@;
  /* do lots of processing here to split data out, work out if you have gone over a block etc. */
run;

Even then its probably not going to work due to line length longer than possible.  I would really recommend checking out the termstr option, or trying to get a better file before you try that route however.

athapa1183
Obsidian | Level 7

it is in windows.

it is really a long single line separated by ~

if you see the 4 segments in the attached text file,i posted they all end in ~ 

but they are together in 1 long line

 

Thank You

ballardw
Super User

And what documentation accompanied this file as to content, layout or value types? For instance, how do you know that CLM is always three characters? Or NTE is two characters when present? These lengths are explicit from your input statement.

Also, how do you know that there are "almost 15,000 CLM "? Did you manually count them (doubtful)?

 

If anyone expected me to use such a file without some description I would likely have someone whipped with wet spaghetti noodles until some sort of clarification was offered.

 

It appears that perhaps the ~ character is some sort of record or field separator.

If that is the case 1) extract one segment using

2) parse that segment for the CLM and NTE

3) output the values

4) repeat for the next segment.

athapa1183
Obsidian | Level 7

hi BALLARDW,

i made it simple.

Yes the CLM is always 3 characters and NTE is 2

i opened the file in notepad++ to get the counts

Even if i do that approach

then how to account for missing NTE segments.

FreelanceReinh
Jade | Level 19

Hi @athapa1183,

 

Try this: Add the options

termstr='~' missover

to your INFILE statement and delete the "@@" from your INPUT statement.

 

With your test data a1.txt it worked on my Windows machine. Of course, this assumes that "~" is the record separator.

 

Edit: The TRUNCOVER option would be safer than MISSOVER if NTE had some shorter values (i.e. length 1) which occurred at the end of a record (i.e. before the "~" character), but you wrote that this is not the case.

athapa1183
Obsidian | Level 7

Hi

 

 

 

How to handle missing value when the NTE segment is itself missing

 

Need the highlighted parts

CLM*15176002400002*18104.49***11+A+1*

NTE*ADD*11899~

AMT*D*1260~

 

 

 

 

CLMFRQNTEAMT
151760024000021118991260
181990001800021875998377
123456784000021 1234
221210001800021 21

 

I really appreciate your help

Thank You

ballardw
Super User

@athapa1183 wrote:

Hi

 

 

 

How to handle missing value when the NTE segment is itself missing

 

Need the highlighted parts

CLM*15176002400002*18104.49***11+A+1*

NTE*ADD*11899~

AMT*D*1260~

 


And now we are back to once again asking: where is the document that describes the data file.

The file you just posted does not have the ~ as a possible record separator the way the earlier example did.

Or there are multiple records that have missing CLM and NTE.

For those interested, if we consider the ~ as a record separator from the previous context then this what the latest example file looks like for a bit of the file:

CLM*15176002400002*18104.49***11+A+1**A*Y*Y~
DCVS*434*RD8*3543453453~
DTP*435*DT*21106091700~
DTP*096*TM*1900~
CL1*1*1*01~
AMT*F3*0~
REF*D9*050GXLO5~
REF*EA*521220~
NTE*ADD*11899~
HI*BK+11214~HI*BJ+47774~
HI*BF+SDFAS*BF+30000+++++++Y*BF+34580+++++++Y*BF+4659+++++++Y*BF+30401+++++++Y*BF+4019+++++++Y*BF+4928+++++++Y~
HI*DR+0000~
HI*BQ+455+D8+3434342*BQ+9921+D8+201232322*BQ+9232+D8+11111*BQ+9390+D8+20150609~
HI*BR+9394+D8+20150609~
HI*BH+11+D8+20150609*BH+A1+D8+19460616*BH+B1+D8+19460616~
HI*BE+01+++623*BE+80+++3*BE+A3+++18104.49~
HCP*02*9636.72****0191~
NM1*77*2*GFHJSW3*****XX*345345345~
N3*DDASWFQQ~
N4*SDFASDF*WERWER*100879998~
SBR*P*18*DUAL***6***MC~
AMT*D*1260~
NM1*IL*1*JJGHGWGF*J***MI*ZZ53898X~
N3*4808 GASDFGSDFG~
N4*SGREGRGTER*11235~
NM1*PR*2*AREGTERTGERGTAW*****PI*40024~
DTP*573*D8*725572~
REF*F8*2452452452~
SBR*S*18*DFGDFG300***1***MB~

 

Which bears almost no resemblance to the previous "example" other than being one long undocumented mess.

The previous example had every occurrence of CLM (except the beginning of the line) as immediately after the ~ and when the NTE value was present it came before the following ~.

 

And what is the rule to find the value of 1 out of "11+A+1"

athapa1183
Obsidian | Level 7

the files are just like the sample data.

 

The CLM are not missing.

 

Only some NTE segments are missing.

~ yes this is record separator but you can use any separator you like to get the results.

what is the rule to find the value of 1 out of "11+A+1"

ANS: there is no rule we just need that 1 from "11+A+1"


Thank You

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 2700 views
  • 1 like
  • 6 in conversation