Hi,
I have a mainframe dataset that is generally formatted as below. Each unique ‘ID’ has multiple consecutive rows associated with them.
Row 1 ID Element1
Row 2 Element 2 Element 3
Row 3 Element 4 Element 5
Row 4 Element 6 Element 7 Element 8 Element 9 Element 10
Unfortunately, not all Unique IDs have 4 rows. About half of the Unique IDs have only 3 rows associated with them. This prevents me from using multiple input statements as I’ve tried here:
DATA TEMP(drop=TYPE);
INFILE IN1;
INPUT
TYPE $1. @;
IF TYPE = 'A' THEN /* each ID begins with ‘A’ */
DO;
INPUT
@23 ID $16. @66 Element1 $13.;
INPUT
@31 Element2 $8. @45 Element3 $10;
INPUT
@25 Element4 $14. @65 Element5 $12;
INPUT
@1 Element6 $12. @25 Element7 $4 @32 Element8 $4 @50 Element9 $4 @63 Element10 $4;
END;
Ideally, I would like help with a data step before executing the above that would insert a blank 4th row as needed for things to work.
Any immediate help is very much appreciated.
Thank you.
In that case, you are working with the proper tools already, and you are very close to a solution. This should work:
DATA TEMP(drop=TYPE);
INFILE IN1 end=done;
INPUT
@23 ID $16. @66 Element1 $13.;
INPUT
@31 Element2 $8. @45 Element3 $10.;
INPUT
@25 Element4 $14. @65 Element5 $12.;
if not done then do;
input @23 Type $1. @@;
if type ne 'A' then input
@1 Element6 $12. @25 Element7 $4. @32 Element8 $4. @50 Element9 $4. @63 Element10 $4.;
end;
run;
Notice a couple of features. The double trailing @@ will hold the line of data for a subsequent INPUT statement, even when reaching the end of the programming statements. And dots have been inserted in the INPUT statements where they were missing and required.
Good luck.
Can you post some more sample data so we can see the different types of records?
How do you know which row contains an ID versus a record with elements?
Hi... the example was simplified but here are snapshots of actual data from same file. Thank you.
Showing 4 rows per Account Number (Unique ID).
Showing 3 rows per Account Number (Unique ID).
........ input x ?? comma32. : if not missing(x) then delete ; .........
If an ID has 3 rows of data instead of 4, is the missing line always the one that contains ELEMENT6 through ELEMENT10?
Hi... yes the missing line is always the one that contains ELEMENT6 through ELEMENT10. Thanks.
In that case, you are working with the proper tools already, and you are very close to a solution. This should work:
DATA TEMP(drop=TYPE);
INFILE IN1 end=done;
INPUT
@23 ID $16. @66 Element1 $13.;
INPUT
@31 Element2 $8. @45 Element3 $10.;
INPUT
@25 Element4 $14. @65 Element5 $12.;
if not done then do;
input @23 Type $1. @@;
if type ne 'A' then input
@1 Element6 $12. @25 Element7 $4. @32 Element8 $4. @50 Element9 $4. @63 Element10 $4.;
end;
run;
Notice a couple of features. The double trailing @@ will hold the line of data for a subsequent INPUT statement, even when reaching the end of the programming statements. And dots have been inserted in the INPUT statements where they were missing and required.
Good luck.
Astounding,
Thanks for your help. It appears to be working well. A good thing to know.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.