Hi SAS Friends,
Am using SAS ver 9.4 to parse up a large data set where the first variable of each sub data set is a product number, followed by either 6 or 7 additional rows if information related to the product number.
In the 6 row version, row 1 has 124 characters
In the 7 row version, row 1 has only 47 characters and the remaining data drop down to create a new row.
I've attached a SAS file with a data step that will import the 6 row data set successfully, and, separately, the 7 row data set successfully. However the entire dataset includes 17000 products with the 6 or 7 row subdata sets interspersed randomly.
Any suggestions about how such a large dataset can be imported? How to get SAS to recognize the 6 or 7 row subdata sets and apply the needed code? Or some other strategy
Thanks, very much appreciated,
R
You may issue many input statements to read a single observation:
DATA Cost;
infile datalines truncover;
INPUT @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;
if missing(PKG) then
input
@60 PKG 4.0
/ @1 NN_3 $1.
/ @1 NN_4 $1.
/ @29 AWP DOLLAR8.2
/ @1 NN_6 $1.
/ @1 NN_7 $12;
else
input
@1 NN_2 $1.
/ @1 NN_3 $1.
/ @1 NN_4 $1.
/ @29 AWP DOLLAR8.2
/ @1 NN_6 $1.
;
FORMAT AWP Dollar8.2 ;
datalines;
20000-0000-00 Product6_________________ 1 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture
AWP $0.00 Retail $0.00 Cost $0.00 Price Schedule Drug Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
100000-0000-00 Product7_________________ N
10 0 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture BAXA CORP.
AWP $158.04 Retail $0.00 Cost $0.00 Price Schedule 0 Drug Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
;
You have some of the hard work.
Now for the fun bits.
Does the same product always have the same 6 or 7 line layout?
If so, do you know which?
If not prepare for more serious coding. Such as do the values of the ABC or PKG give a clue to how many likes are to be read?
You may issue many input statements to read a single observation:
DATA Cost;
infile datalines truncover;
INPUT @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;
if missing(PKG) then
input
@60 PKG 4.0
/ @1 NN_3 $1.
/ @1 NN_4 $1.
/ @29 AWP DOLLAR8.2
/ @1 NN_6 $1.
/ @1 NN_7 $12;
else
input
@1 NN_2 $1.
/ @1 NN_3 $1.
/ @1 NN_4 $1.
/ @29 AWP DOLLAR8.2
/ @1 NN_6 $1.
;
FORMAT AWP Dollar8.2 ;
datalines;
20000-0000-00 Product6_________________ 1 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture
AWP $0.00 Retail $0.00 Cost $0.00 Price Schedule Drug Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
100000-0000-00 Product7_________________ N
10 0 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture BAXA CORP.
AWP $158.04 Retail $0.00 Cost $0.00 Price Schedule 0 Drug Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
;
Thank you ! Both replies work and both methods will help with importing the dataset.
Much appreciated.
So the first row of each series has 124 if its 6 rows and 47 characters in the 7th row?
@rmacarthur wrote:
Hi SAS Friends,
In the 6 row version, row 1 has 124 characters
In the 7 row version, row 1 has only 47 characters and the remaining data drop down to create a new row.
R
If that's so, consider two alternate import statements. Theoretically something like this, assuming I'm interpreting this correctly.
data have;
*read line and hold;
input @@;
*check length of line and execute conditional input;
if length(_infile_) = 147 then do;
input var1 var2
/ var3 var4
...
/ var99 ; *7thline;
end;
else do;
input .....
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.