BookmarkSubscribeRSS Feed
ZRick
Obsidian | Level 7

I have a raw data like this:

649519BU8 12/15/2021 100 12/15/2022 100 12/15/2023 100 12/15/2024 100 12/15/2025 100

519Bw852 12/15/2026 100 12/15/2027 100 12/15/2028 100 12/15/2029 100 12/15/2030

X028W71659 11/19/2011 3/9/2012 3/9/2022 12/15/2005 4/8/2008

D7P00A1 12/15/2031 70 12/15/2032 60 12/15/2033 74 12/15/2034 68 12/15/2035 68 12/15/2036 75

DE000A1GNAH1 7/8/2021 80 10/8/2021 70 1/8/2022 100 4/8/2022 60 7/8/2022 80 10/8/2022

XS0271265299 10/19/2011 1/8/2022 4/8/2022 12/15/2025 4/8/2023

804519KN3 12/15/2026 99 12/15/2027 100 . 100 12/15/2029 . 12/15/2030

I want to read them into sas file into 3 colums

ID                 Date          Price

649519BU8  12/15/2021  100 

649519BU8  12/15/2022  100

649519BU8  12/15/2023  100

D7P00A1    12/15/2031    0.7

D7P00A1    12/15/2032    0.6

XS0271265299  10/19/2011  .

XS0271265299  1/8/2022   .

XS0271265299  4/8/2022   .   

This challenge is the price column in the raw data file is not fixed,

some times it have price like 1st, 2nd row

some times it does not have price, like 3rd row (X028W71659 row)

some times it is a percentage like 4th row (D7P00A1 row), it is 70%, or 0.7, so basically any value less than 90 is a percentage, not a price

each row, in the end, they are not fixed columns/ length or any sort.

so, how do I read them in?

15 REPLIES 15
manojinpec
Obsidian | Level 7

Very Tough i guess.

Not sure whether it is possible or not as well.

SR71Blackbird
Calcite | Level 5

Hello ZRick,

just try this:

filename mydata "path to data\data.txt";

data mydata(drop=row i);
length ID $20 Date $15 Price 8;
infile mydata truncover lrecl=256;
input @1 row $256.;
ID = scan(row,1,' ');
i=2;
do while (Date ne '' or i=2);
Date = scan(row,i,' ');
if index(scan(row,i+1,' '),'/') then do;
       Price = .;
     i = i + 1;
end;
else do;
    Price = input(scan(row,i+1,' '),12.);
    if Price < 90 then Price = Price / 100;
     i = i + 2;
end;
if Date ne '' then output;
end;
run;

Just remember that the parameter "lrecl=256" should be as long or longer than the longest row and the informat to variable "row" should be as long as this parameter.

sambasiva_ravuri_tcs_com
Calcite | Level 5

Hi,

Here am seeing one of the column ID is Alphanumeric and even the length of ID also not fixed. Please confirm the length of Id once.

Regards,

S Ravuri.

Ksharp
Super User

How about:

data x(drop=x);
infile cards truncover;
length x $ 3;
input id : $20. @;
do until(missing(x));
input date : mmddyy12. @ ;
input x  $char3. +(-3)  @;  
if not missing(x) or not missing(date) then do;
if index(x,'/') then do;
                      price=.;output;
                     end;
 else do; input  price  @; output;end;
end;  
                          end; 
format date mmddyy10.;
cards; 
649519BU8 12/15/2021 100 12/15/2022 100 12/15/2023 100 12/15/2024 100 12/15/2025 100
519Bw852 12/15/2026 100 12/15/2027 100 12/15/2028 100 12/15/2029 100 12/15/2030
X028W71659 11/19/2011 3/9/2012 3/9/2022 12/15/2005 4/8/2008
D7P00A1 12/15/2031 70 12/15/2032 60 12/15/2033 74 12/15/2034 68 12/15/2035 68 12/15/2036 75
DE000A1GNAH1 7/8/2021 80 10/8/2021 70 1/8/2022 100 4/8/2022 60 7/8/2022 80 10/8/2022
XS0271265299 10/19/2011 1/8/2022 4/8/2022 12/15/2025 4/8/2023
804519KN3 12/15/2026 99 12/15/2027 100 . 100 12/15/2029 . 12/15/2030
;
run;


Ksharp

Ksharp
Super User

Opps.

You also need another data step to transform the price which less than 70 into percent 0.7. That is very easy for you.

Ksharp

sambasiva_ravuri_tcs_com
Calcite | Level 5

I don't think Mainframe SAS won't support more than 80 characters length of Input, eventhough we put lrecl parameter. Please confirm ?

Thanks,

S Rvauri.

SR71Blackbird
Calcite | Level 5

Hello S Rvauri,

for SAS programs as member in a PDS-file you have to use 80 chars of length, but for simple data in a flatfile you can use much more, I think about 32000 or so. Just try to allocate an extra file for the data on z/OS.

So "infile cards" won't be the preferred way, you should reference an external flat file.

sambasiva_ravuri_tcs_com
Calcite | Level 5

Hi SR71Blackbird,

I agree with you. We are not suppose to use Infile cards and this is possible only for external flat file.

Regards,

S Ravuri.

George_S
Fluorite | Level 6

Hello  SR71Blackbird,

If the row is extremely larger than we can imaged(not just hundres,maby millons or more rows),How about read that into a hash object before the result output to a table?

George

Ksharp
Super User

I optimized my code.

data x(drop=x );
infile cards truncover;
length x $ 12;
input id : $20. @;
do until(missing(x));
input date : mmddyy12. @ ;
input x : $char12.  +(-1*(length(x)+1)) @; 
if not missing(x) or not missing(date) then do;
if index(x,'/') then do;
                      price=.;output;
                     end;
 else do; input  price  @; output;end; 
end;
end;
format date mmddyy10.;
cards; 
649519BU8 12/15/2021 100 12/15/2022 100 12/15/2023 100 12/15/2024 100 12/15/2025 100
519Bw852 12/15/2026 100 12/15/2027 100 12/15/2028 100 12/15/2029 100 12/15/2030
X028W71659 11/19/2011 3/9/2012 3/9/2022 12/15/2005 4/8/2008
D7P00A1 12/15/2031 70 12/15/2032 60 12/15/2033 74 12/15/2034 68 12/15/2035 68 12/15/2036 75
DE000A1GNAH1 7/8/2021 80 10/8/2021 70 1/8/2022 100 4/8/2022 60 7/8/2022 80 10/8/2022
XS0271265299 10/19/2011 1/8/2022 4/8/2022 12/15/2025 4/8/2023
804519KN3 12/15/2026 99 12/15/2027 100 . 100 12/15/2029 . 12/15/2030
;
run;

Ksharp

ZRick
Obsidian | Level 7

really nice code, thank you so much, Ksharp

Ksharp
Super User

Did you try SR71Blackbird's code ?

I also want to know the result of running his code.

Ksharp

ZRick
Obsidian | Level 7

I tested both codes,

Ksharp, I think your code has some issue, it didn't read in correctly. for records:

DE000A1GNAH1, 10/08/2022, data has  2       

XS0271265299, 04/08/2023, data has  3,

904510KN3, 12/15/2030, data has 30

all 2, 3, 30 should not be there. Here is your code, I change it little bit to make it read in txt file

data Ksharp_x(drop=x );

  infile mydata truncover;

  length x $ 12;

  input id : $20. @;

  do until(missing(x));

  input date : mmddyy12. @ ;

  input x : $char12.  +(-1*(length(x)+1)) @;

  if not missing(x) or not missing(date) then

    do;

      if index(x,'/') then

          do;

            price=.;

            output;

          end;

    else do;

        input  price  @;

        output;

    end;

end;

end;

format date mmddyy10.;

run;

SR71Blackbird, read all data correctly.

Ksharp, can you help fix it?

Ksharp
Super User

OK. But these data doesn't look like what you firstly post.

data x(drop=x );
infile cards truncover dlm=',';
length x price $ 20;
input id : $20. @;
do until(missing(x));
input date : mmddyy12. @ ;
input x $char20.  +(-1*(lengthc(x)+1)) @; 
if not missing(x) or not missing(date) then do;
if index(x,'/') then do;
                      price=' ';output;
                     end;
 else do; input  price $ @; output;end; 
end;
end;
format date mmddyy10.;
cards; 
DE000A1GNAH1, 10/08/2022, data has  2       
XS0271265299, 04/08/2023, data has  3,
904510KN3, 12/15/2030, data has 30
;
run;

Ksharp

消息编辑者为:xia keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2307 views
  • 5 likes
  • 6 in conversation