The data is very long and just a single row. I just have 4 segments from the long line.
CLM*178*322.56***12+B+1*Y*A*Y*Y**NTE*19*94289*WER~CLM*123 *322.56***11+A+1*Y*A*Y*Y* *90289*RRW~CLM*223 *322.56***11+A+1*Y*A*Y*Y* NTE*11*93289*RRW~CLM*120 *322.56***11+A+1*Y*A*Y*Y* NTE*17*91289*SEW~
I want
CLM | NTE |
178 | 19 |
123 | Blank |
223 | 11 |
120 | 17 |
I have almost 15,000 CLM in the long line.
Sometimes the NTE segment is itself missing and it messes up my output.
Here is my code :
data a_111;
infile 'E:\Users\Documents and Settings\57260\Desktop\test\.W.160712102723.086.dat' lrecl = 1000000000 ;
input @'CLM*' string $3. @'NTE*' COS $2. @@;
run;
how to account for when NTE segment is missing.
I have attached a small sample of data
I will appreciate your help in this regard.
Post test data in the form of a datastep!!
As such I am once again guessing:
data want; set have; do i=1 to countw(your_string,"CLM"); new_line=scan(your_string,i,"CLM");
clm=scan(new_line,1,"*"); output; end; run;
You can then use the same sort of scan() function to split out parts of the remaining string.
THis will not take care of missing NTE Segment.
Thank YOu
And this is of course why my final line was:
"You can then use the same sort of scan() function to split out parts of the remaining string."
You can work out how to do that part from the code I have provided?
Thank YOu RW9.
the file is a dat file with only 1 long single line. Not a dataset yet.
You need to know what the file structure is then. I assume that it is * delimited, so that would be step one:
data want; infile "...dat" dlm="*" missover dsd; input ...; run;
From there you would add to the input statement the structure of the file you are reading. In this way you would then end up with the elements in variables.
This: lrecl = 1000000000
Is not going to work however:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003160472.htm
There is a maximum to the length. It would be simpler to get the file with line breaks (perhaps it already does, maybe you need to change the termstr:
http://support.sas.com/kb/14/178.html
I.e. did the file come from unix and you are using on windows perhaps or vice versa, you can use notepad++ or one of the text editors to look and see).
I tried all of it. Does not work with @@.
dlm="*" missover dsd
No, you do not use @@ with that line. The file you get, is it from unix, is it from windows. These have different line endings, one has line feed + carriage return, the other has just carriage return. From the documentation:
http://support.sas.com/kb/14/178.html
Something like:
infile "...dat" dlm="*" missover dsd termstr=lf;
It is very unlikely that your file is one long string.
If however it is, then return to source and get a better file.
The only other solution is to try to go down the route of reading it block at a time, then parsing out each individual element, which will be a real pain for you. then you would need the @@:
data inter; infile "...dat"; input long_string $ @@; /* do lots of processing here to split data out, work out if you have gone over a block etc. */ run;
Even then its probably not going to work due to line length longer than possible. I would really recommend checking out the termstr option, or trying to get a better file before you try that route however.
it is in windows.
it is really a long single line separated by ~
if you see the 4 segments in the attached text file,i posted they all end in ~
but they are together in 1 long line
Thank You
And what documentation accompanied this file as to content, layout or value types? For instance, how do you know that CLM is always three characters? Or NTE is two characters when present? These lengths are explicit from your input statement.
Also, how do you know that there are "almost 15,000 CLM "? Did you manually count them (doubtful)?
If anyone expected me to use such a file without some description I would likely have someone whipped with wet spaghetti noodles until some sort of clarification was offered.
It appears that perhaps the ~ character is some sort of record or field separator.
If that is the case 1) extract one segment using
2) parse that segment for the CLM and NTE
3) output the values
4) repeat for the next segment.
hi BALLARDW,
i made it simple.
Yes the CLM is always 3 characters and NTE is 2
i opened the file in notepad++ to get the counts
Even if i do that approach
then how to account for missing NTE segments.
Hi @athapa1183,
Try this: Add the options
termstr='~' missover
to your INFILE statement and delete the "@@" from your INPUT statement.
With your test data a1.txt it worked on my Windows machine. Of course, this assumes that "~" is the record separator.
Edit: The TRUNCOVER option would be safer than MISSOVER if NTE had some shorter values (i.e. length 1) which occurred at the end of a record (i.e. before the "~" character), but you wrote that this is not the case.
Thank You for the suggestion.
Yes i did see that it works. I thought this simple piece will work in my big dataset.
Now i am attaching a portion of my big dataset.I have changed it little though.
How to handle missing value when the NTE segment is itself missing
Need the highlighted parts
CLM*15176002400002*18104.49***11+A+1*
NTE*ADD*11899~
AMT*D*1260~
My code
data a_111;
infile 'E:\Users\Documents and Settings\57260\Desktop\s1.TXT' termstr='~' missover lrecl = 1000000000 ;
input @'CLM*' string $21. @'***' +5 FREQ $1. @'NTE*TPO*' COS $5. @'AMT*D*' Paid_Amt1 $6. ;
RUN;
i WANT :
CLM | FRQ | NTE | AMT |
15176002400002 | 1 | 11899 | 1260 |
18199000180002 | 1 | 87599 | 8377 |
12345678400002 | 1 | 1234 | |
22121000180002 | 1 | 21 |
I really appreciate your help
Thank You
@athapa1183 wrote:
Thank You for the suggestion.
Yes i did see that it works. I thought this simple piece will work in my big dataset.
Now i am attaching a portion of my big dataset.I have changed it little though.
How to handle missing value when the NTE segment is itself missing
Need the highlighted parts
CLM*15176002400002*18104.49***11+A+1*
NTE*ADD*11899~
AMT*D*1260~
And now we are back to once again asking: where is the document that describes the data file.
The file you just posted does not have the ~ as a possible record separator the way the earlier example did.
Or there are multiple records that have missing CLM and NTE.
For those interested, if we consider the ~ as a record separator from the previous context then this what the latest example file looks like for a bit of the file:
CLM*15176002400002*18104.49***11+A+1**A*Y*Y~ DCVS*434*RD8*3543453453~ DTP*435*DT*21106091700~ DTP*096*TM*1900~ CL1*1*1*01~ AMT*F3*0~ REF*D9*050GXLO5~ REF*EA*521220~ NTE*ADD*11899~ HI*BK+11214~HI*BJ+47774~ HI*BF+SDFAS*BF+30000+++++++Y*BF+34580+++++++Y*BF+4659+++++++Y*BF+30401+++++++Y*BF+4019+++++++Y*BF+4928+++++++Y~ HI*DR+0000~ HI*BQ+455+D8+3434342*BQ+9921+D8+201232322*BQ+9232+D8+11111*BQ+9390+D8+20150609~ HI*BR+9394+D8+20150609~ HI*BH+11+D8+20150609*BH+A1+D8+19460616*BH+B1+D8+19460616~ HI*BE+01+++623*BE+80+++3*BE+A3+++18104.49~ HCP*02*9636.72****0191~ NM1*77*2*GFHJSW3*****XX*345345345~ N3*DDASWFQQ~ N4*SDFASDF*WERWER*100879998~ SBR*P*18*DUAL***6***MC~ AMT*D*1260~ NM1*IL*1*JJGHGWGF*J***MI*ZZ53898X~ N3*4808 GASDFGSDFG~ N4*SGREGRGTER*11235~ NM1*PR*2*AREGTERTGERGTAW*****PI*40024~ DTP*573*D8*725572~ REF*F8*2452452452~ SBR*S*18*DFGDFG300***1***MB~
Which bears almost no resemblance to the previous "example" other than being one long undocumented mess.
The previous example had every occurrence of CLM (except the beginning of the line) as immediately after the ~ and when the NTE value was present it came before the following ~.
And what is the rule to find the value of 1 out of "11+A+1"
the files are just like the sample data.
The CLM are not missing.
Only some NTE segments are missing.
~ yes this is record separator but you can use any separator you like to get the results.
what is the rule to find the value of 1 out of "11+A+1"
ANS: there is no rule we just need that 1 from "11+A+1"
Thank You
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.