Hi SAS Friends,
Am using SAS ver 9.4 to parse up a large data set where the first variable of each sub data set is a product number, followed by either 6 or 7 additional rows if information related to the product number.
In the 6 row version, row 1 has 124 characters
In the 7 row version, row 1 has only 47 characters and the remaining data drop down to create a new row.
I've attached a SAS file with a data step that will import the 6 row data set successfully, and, separately, the 7 row data set successfully. However the entire dataset includes 17000 products with the 6 or 7 row subdata sets interspersed randomly.
Any suggestions about how such a large dataset can be imported? How to get SAS to recognize the 6 or 7 row subdata sets and apply the needed code? Or some other strategy
Thanks, very much appreciated,
R
You may issue many input statements to read a single observation:
DATA Cost;
infile datalines truncover;
INPUT @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;
if missing(PKG) then
input
@60 PKG 4.0
/ @1 NN_3 $1.
/ @1 NN_4 $1.
/ @29 AWP DOLLAR8.2
/ @1 NN_6 $1.
/ @1 NN_7 $12;
else
input
@1 NN_2 $1.
/ @1 NN_3 $1.
/ @1 NN_4 $1.
/ @29 AWP DOLLAR8.2
/ @1 NN_6 $1.
;
FORMAT AWP Dollar8.2 ;
datalines;
20000-0000-00 Product6_________________ 1 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture
AWP $0.00 Retail $0.00 Cost $0.00 Price Schedule Drug Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
100000-0000-00 Product7_________________ N
10 0 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture BAXA CORP.
AWP $158.04 Retail $0.00 Cost $0.00 Price Schedule 0 Drug Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
;
You have some of the hard work.
Now for the fun bits.
Does the same product always have the same 6 or 7 line layout?
If so, do you know which?
If not prepare for more serious coding. Such as do the values of the ABC or PKG give a clue to how many likes are to be read?
You may issue many input statements to read a single observation:
DATA Cost;
infile datalines truncover;
INPUT @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;
if missing(PKG) then
input
@60 PKG 4.0
/ @1 NN_3 $1.
/ @1 NN_4 $1.
/ @29 AWP DOLLAR8.2
/ @1 NN_6 $1.
/ @1 NN_7 $12;
else
input
@1 NN_2 $1.
/ @1 NN_3 $1.
/ @1 NN_4 $1.
/ @29 AWP DOLLAR8.2
/ @1 NN_6 $1.
;
FORMAT AWP Dollar8.2 ;
datalines;
20000-0000-00 Product6_________________ 1 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture
AWP $0.00 Retail $0.00 Cost $0.00 Price Schedule Drug Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
100000-0000-00 Product7_________________ N
10 0 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture BAXA CORP.
AWP $158.04 Retail $0.00 Cost $0.00 Price Schedule 0 Drug Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
;
Thank you ! Both replies work and both methods will help with importing the dataset.
Much appreciated.
So the first row of each series has 124 if its 6 rows and 47 characters in the 7th row?
@rmacarthur wrote:
Hi SAS Friends,
In the 6 row version, row 1 has 124 characters
In the 7 row version, row 1 has only 47 characters and the remaining data drop down to create a new row.
R
If that's so, consider two alternate import statements. Theoretically something like this, assuming I'm interpreting this correctly.
data have;
*read line and hold;
input @@;
*check length of line and execute conditional input;
if length(_infile_) = 147 then do;
input var1 var2
/ var3 var4
...
/ var99 ; *7thline;
end;
else do;
input .....
end;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.