I have a raw data like this:
649519BU8 12/15/2021 100 12/15/2022 100 12/15/2023 100 12/15/2024 100 12/15/2025 100
519Bw852 12/15/2026 100 12/15/2027 100 12/15/2028 100 12/15/2029 100 12/15/2030
X028W71659 11/19/2011 3/9/2012 3/9/2022 12/15/2005 4/8/2008
D7P00A1 12/15/2031 70 12/15/2032 60 12/15/2033 74 12/15/2034 68 12/15/2035 68 12/15/2036 75
DE000A1GNAH1 7/8/2021 80 10/8/2021 70 1/8/2022 100 4/8/2022 60 7/8/2022 80 10/8/2022
XS0271265299 10/19/2011 1/8/2022 4/8/2022 12/15/2025 4/8/2023
804519KN3 12/15/2026 99 12/15/2027 100 . 100 12/15/2029 . 12/15/2030
I want to read them into sas file into 3 colums
ID Date Price
649519BU8 12/15/2021 100
649519BU8 12/15/2022 100
649519BU8 12/15/2023 100
D7P00A1 12/15/2031 0.7
D7P00A1 12/15/2032 0.6
XS0271265299 10/19/2011 .
XS0271265299 1/8/2022 .
XS0271265299 4/8/2022 .
This challenge is the price column in the raw data file is not fixed,
some times it have price like 1st, 2nd row
some times it does not have price, like 3rd row (X028W71659 row)
some times it is a percentage like 4th row (D7P00A1 row), it is 70%, or 0.7, so basically any value less than 90 is a percentage, not a price
each row, in the end, they are not fixed columns/ length or any sort.
so, how do I read them in?
Very Tough i guess.
Not sure whether it is possible or not as well.
Hello ZRick,
just try this:
filename mydata "path to data\data.txt";
data mydata(drop=row i);
length ID $20 Date $15 Price 8;
infile mydata truncover lrecl=256;
input @1 row $256.;
ID = scan(row,1,' ');
i=2;
do while (Date ne '' or i=2);
Date = scan(row,i,' ');
if index(scan(row,i+1,' '),'/') then do;
Price = .;
i = i + 1;
end;
else do;
Price = input(scan(row,i+1,' '),12.);
if Price < 90 then Price = Price / 100;
i = i + 2;
end;
if Date ne '' then output;
end;
run;
Just remember that the parameter "lrecl=256" should be as long or longer than the longest row and the informat to variable "row" should be as long as this parameter.
Hi,
Here am seeing one of the column ID is Alphanumeric and even the length of ID also not fixed. Please confirm the length of Id once.
Regards,
S Ravuri.
How about:
data x(drop=x); infile cards truncover; length x $ 3; input id : $20. @; do until(missing(x)); input date : mmddyy12. @ ; input x $char3. +(-3) @; if not missing(x) or not missing(date) then do; if index(x,'/') then do; price=.;output; end; else do; input price @; output;end; end; end; format date mmddyy10.; cards; 649519BU8 12/15/2021 100 12/15/2022 100 12/15/2023 100 12/15/2024 100 12/15/2025 100 519Bw852 12/15/2026 100 12/15/2027 100 12/15/2028 100 12/15/2029 100 12/15/2030 X028W71659 11/19/2011 3/9/2012 3/9/2022 12/15/2005 4/8/2008 D7P00A1 12/15/2031 70 12/15/2032 60 12/15/2033 74 12/15/2034 68 12/15/2035 68 12/15/2036 75 DE000A1GNAH1 7/8/2021 80 10/8/2021 70 1/8/2022 100 4/8/2022 60 7/8/2022 80 10/8/2022 XS0271265299 10/19/2011 1/8/2022 4/8/2022 12/15/2025 4/8/2023 804519KN3 12/15/2026 99 12/15/2027 100 . 100 12/15/2029 . 12/15/2030 ; run;
Ksharp
Opps.
You also need another data step to transform the price which less than 70 into percent 0.7. That is very easy for you.
Ksharp
I don't think Mainframe SAS won't support more than 80 characters length of Input, eventhough we put lrecl parameter. Please confirm ?
Thanks,
S Rvauri.
Hello S Rvauri,
for SAS programs as member in a PDS-file you have to use 80 chars of length, but for simple data in a flatfile you can use much more, I think about 32000 or so. Just try to allocate an extra file for the data on z/OS.
So "infile cards" won't be the preferred way, you should reference an external flat file.
Hi SR71Blackbird,
I agree with you. We are not suppose to use Infile cards and this is possible only for external flat file.
Regards,
S Ravuri.
Hello SR71Blackbird,
If the row is extremely larger than we can imaged(not just hundres,maby millons or more rows),How about read that into a hash object before the result output to a table?
George
I optimized my code.
data x(drop=x ); infile cards truncover; length x $ 12; input id : $20. @; do until(missing(x)); input date : mmddyy12. @ ; input x : $char12. +(-1*(length(x)+1)) @; if not missing(x) or not missing(date) then do; if index(x,'/') then do; price=.;output; end; else do; input price @; output;end; end; end; format date mmddyy10.; cards; 649519BU8 12/15/2021 100 12/15/2022 100 12/15/2023 100 12/15/2024 100 12/15/2025 100 519Bw852 12/15/2026 100 12/15/2027 100 12/15/2028 100 12/15/2029 100 12/15/2030 X028W71659 11/19/2011 3/9/2012 3/9/2022 12/15/2005 4/8/2008 D7P00A1 12/15/2031 70 12/15/2032 60 12/15/2033 74 12/15/2034 68 12/15/2035 68 12/15/2036 75 DE000A1GNAH1 7/8/2021 80 10/8/2021 70 1/8/2022 100 4/8/2022 60 7/8/2022 80 10/8/2022 XS0271265299 10/19/2011 1/8/2022 4/8/2022 12/15/2025 4/8/2023 804519KN3 12/15/2026 99 12/15/2027 100 . 100 12/15/2029 . 12/15/2030 ; run;
Ksharp
really nice code, thank you so much, Ksharp
I tested both codes,
Ksharp, I think your code has some issue, it didn't read in correctly. for records:
DE000A1GNAH1, 10/08/2022, data has 2
XS0271265299, 04/08/2023, data has 3,
904510KN3, 12/15/2030, data has 30
all 2, 3, 30 should not be there. Here is your code, I change it little bit to make it read in txt file
data Ksharp_x(drop=x );
infile mydata truncover;
length x $ 12;
input id : $20. @;
do until(missing(x));
input date : mmddyy12. @ ;
input x : $char12. +(-1*(length(x)+1)) @;
if not missing(x) or not missing(date) then
do;
if index(x,'/') then
do;
price=.;
output;
end;
else do;
input price @;
output;
end;
end;
end;
format date mmddyy10.;
run;
SR71Blackbird, read all data correctly.
Ksharp, can you help fix it?
OK. But these data doesn't look like what you firstly post.
data x(drop=x ); infile cards truncover dlm=','; length x price $ 20; input id : $20. @; do until(missing(x)); input date : mmddyy12. @ ; input x $char20. +(-1*(lengthc(x)+1)) @; if not missing(x) or not missing(date) then do; if index(x,'/') then do; price=' ';output; end; else do; input price $ @; output;end; end; end; format date mmddyy10.; cards; DE000A1GNAH1, 10/08/2022, data has 2 XS0271265299, 04/08/2023, data has 3, 904510KN3, 12/15/2030, data has 30 ; run;
Ksharp
消息编辑者为:xia keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.