DATA Step, Macro, Functions and more

how to read in really messy data like this?

Reply
Frequent Contributor
Posts: 133

how to read in really messy data like this?

I have a raw data like this:

649519BU8 12/15/2021 100 12/15/2022 100 12/15/2023 100 12/15/2024 100 12/15/2025 100

519Bw852 12/15/2026 100 12/15/2027 100 12/15/2028 100 12/15/2029 100 12/15/2030

X028W71659 11/19/2011 3/9/2012 3/9/2022 12/15/2005 4/8/2008

D7P00A1 12/15/2031 70 12/15/2032 60 12/15/2033 74 12/15/2034 68 12/15/2035 68 12/15/2036 75

DE000A1GNAH1 7/8/2021 80 10/8/2021 70 1/8/2022 100 4/8/2022 60 7/8/2022 80 10/8/2022

XS0271265299 10/19/2011 1/8/2022 4/8/2022 12/15/2025 4/8/2023

804519KN3 12/15/2026 99 12/15/2027 100 . 100 12/15/2029 . 12/15/2030

I want to read them into sas file into 3 colums

ID                 Date          Price

649519BU8  12/15/2021  100 

649519BU8  12/15/2022  100

649519BU8  12/15/2023  100

D7P00A1    12/15/2031    0.7

D7P00A1    12/15/2032    0.6

XS0271265299  10/19/2011  .

XS0271265299  1/8/2022   .

XS0271265299  4/8/2022   .   

This challenge is the price column in the raw data file is not fixed,

some times it have price like 1st, 2nd row

some times it does not have price, like 3rd row (X028W71659 row)

some times it is a percentage like 4th row (D7P00A1 row), it is 70%, or 0.7, so basically any value less than 90 is a percentage, not a price

each row, in the end, they are not fixed columns/ length or any sort.

so, how do I read them in?

Frequent Contributor
Posts: 138

how to read in really messy data like this?

Very Tough i guess.

Not sure whether it is possible or not as well.

New Contributor
Posts: 2

Re: how to read in really messy data like this?

Hello ZRick,

just try this:

filename mydata "path to data\data.txt";

data mydata(drop=row i);
length ID $20 Date $15 Price 8;
infile mydata truncover lrecl=256;
input @1 row $256.;
ID = scan(row,1,' ');
i=2;
do while (Date ne '' or i=2);
Date = scan(row,i,' ');
if index(scan(row,i+1,' '),'/') then do;
       Price = .;
     i = i + 1;
end;
else do;
    Price = input(scan(row,i+1,' '),12.);
    if Price < 90 then Price = Price / 100;
     i = i + 2;
end;
if Date ne '' then output;
end;
run;

Just remember that the parameter "lrecl=256" should be as long or longer than the longest row and the informat to variable "row" should be as long as this parameter.

Contributor
Posts: 42

how to read in really messy data like this?

Hi,

Here am seeing one of the column ID is Alphanumeric and even the length of ID also not fixed. Please confirm the length of Id once.

Regards,

S Ravuri.

Super User
Posts: 9,681

how to read in really messy data like this?

How about:

data x(drop=x);
infile cards truncover;
length x $ 3;
input id : $20. @;
do until(missing(x));
input date : mmddyy12. @ ;
input x  $char3. +(-3)  @;  
if not missing(x) or not missing(date) then do;
if index(x,'/') then do;
                      price=.;output;
                     end;
 else do; input  price  @; output;end;
end;  
                          end; 
format date mmddyy10.;
cards; 
649519BU8 12/15/2021 100 12/15/2022 100 12/15/2023 100 12/15/2024 100 12/15/2025 100
519Bw852 12/15/2026 100 12/15/2027 100 12/15/2028 100 12/15/2029 100 12/15/2030
X028W71659 11/19/2011 3/9/2012 3/9/2022 12/15/2005 4/8/2008
D7P00A1 12/15/2031 70 12/15/2032 60 12/15/2033 74 12/15/2034 68 12/15/2035 68 12/15/2036 75
DE000A1GNAH1 7/8/2021 80 10/8/2021 70 1/8/2022 100 4/8/2022 60 7/8/2022 80 10/8/2022
XS0271265299 10/19/2011 1/8/2022 4/8/2022 12/15/2025 4/8/2023
804519KN3 12/15/2026 99 12/15/2027 100 . 100 12/15/2029 . 12/15/2030
;
run;


Ksharp

Super User
Posts: 9,681

how to read in really messy data like this?

Opps.

You also need another data step to transform the price which less than 70 into percent 0.7. That is very easy for you.

Ksharp

Contributor
Posts: 42

how to read in really messy data like this?

I don't think Mainframe SAS won't support more than 80 characters length of Input, eventhough we put lrecl parameter. Please confirm ?

Thanks,

S Rvauri.

New Contributor
Posts: 2

Re: how to read in really messy data like this?

Hello S Rvauri,

for SAS programs as member in a PDS-file you have to use 80 chars of length, but for simple data in a flatfile you can use much more, I think about 32000 or so. Just try to allocate an extra file for the data on z/OS.

So "infile cards" won't be the preferred way, you should reference an external flat file.

Contributor
Posts: 42

how to read in really messy data like this?

Hi SR71Blackbird,

I agree with you. We are not suppose to use Infile cards and this is possible only for external flat file.

Regards,

S Ravuri.

Contributor
Posts: 35

Re: how to read in really messy data like this?

Hello  SR71Blackbird,

If the row is extremely larger than we can imaged(not just hundres,maby millons or more rows),How about read that into a hash object before the result output to a table?

George

Super User
Posts: 9,681

how to read in really messy data like this?

I optimized my code.

data x(drop=x );
infile cards truncover;
length x $ 12;
input id : $20. @;
do until(missing(x));
input date : mmddyy12. @ ;
input x : $char12.  +(-1*(length(x)+1)) @; 
if not missing(x) or not missing(date) then do;
if index(x,'/') then do;
                      price=.;output;
                     end;
 else do; input  price  @; output;end; 
end;
end;
format date mmddyy10.;
cards; 
649519BU8 12/15/2021 100 12/15/2022 100 12/15/2023 100 12/15/2024 100 12/15/2025 100
519Bw852 12/15/2026 100 12/15/2027 100 12/15/2028 100 12/15/2029 100 12/15/2030
X028W71659 11/19/2011 3/9/2012 3/9/2022 12/15/2005 4/8/2008
D7P00A1 12/15/2031 70 12/15/2032 60 12/15/2033 74 12/15/2034 68 12/15/2035 68 12/15/2036 75
DE000A1GNAH1 7/8/2021 80 10/8/2021 70 1/8/2022 100 4/8/2022 60 7/8/2022 80 10/8/2022
XS0271265299 10/19/2011 1/8/2022 4/8/2022 12/15/2025 4/8/2023
804519KN3 12/15/2026 99 12/15/2027 100 . 100 12/15/2029 . 12/15/2030
;
run;

Ksharp

Frequent Contributor
Posts: 133

how to read in really messy data like this?

really nice code, thank you so much, Ksharp

Super User
Posts: 9,681

how to read in really messy data like this?

Did you try SR71Blackbird's code ?

I also want to know the result of running his code.

Ksharp

Frequent Contributor
Posts: 133

Re: how to read in really messy data like this?

I tested both codes,

Ksharp, I think your code has some issue, it didn't read in correctly. for records:

DE000A1GNAH1, 10/08/2022, data has  2       

XS0271265299, 04/08/2023, data has  3,

904510KN3, 12/15/2030, data has 30

all 2, 3, 30 should not be there. Here is your code, I change it little bit to make it read in txt file

data Ksharp_x(drop=x );

  infile mydata truncover;

  length x $ 12;

  input id : $20. @;

  do until(missing(x));

  input date : mmddyy12. @ ;

  input x : $char12.  +(-1*(length(x)+1)) @;

  if not missing(x) or not missing(date) then

    do;

      if index(x,'/') then

          do;

            price=.;

            output;

          end;

    else do;

        input  price  @;

        output;

    end;

end;

end;

format date mmddyy10.;

run;

SR71Blackbird, read all data correctly.

Ksharp, can you help fix it?

Super User
Posts: 9,681

Re: how to read in really messy data like this?

OK. But these data doesn't look like what you firstly post.

data x(drop=x );
infile cards truncover dlm=',';
length x price $ 20;
input id : $20. @;
do until(missing(x));
input date : mmddyy12. @ ;
input x $char20.  +(-1*(lengthc(x)+1)) @; 
if not missing(x) or not missing(date) then do;
if index(x,'/') then do;
                      price=' ';output;
                     end;
 else do; input  price $ @; output;end; 
end;
end;
format date mmddyy10.;
cards; 
DE000A1GNAH1, 10/08/2022, data has  2       
XS0271265299, 04/08/2023, data has  3,
904510KN3, 12/15/2030, data has 30
;
run;

Ksharp

消息编辑者为:xia keshan

Ask a Question
Discussion stats
  • 15 replies
  • 1099 views
  • 5 likes
  • 6 in conversation