Hello I am trying to clean up data from a system generated report. The data has header or unique identification by observations but not a way to identify each observation with an id or account. I want to copy the observation and place it in a column. The sample spreadsheet shows the structure of the data. For my data I want to copy information in the row like:
Loan #: | 04-0026 | Account Name: | Chew Dewey Barbeque |
The id would go in the id column and Chew Dewey Barbeque would go in the account name column. Then the column would be filled with this data until the end of the record. It is easy to make a column that copies into ID and Account name the observation. But I can not figure out a way to copy this data for all the other relevant observations that occur below it, as I can not sort the data yet being there is no unique identification yet. Does anyone know of a way to copy the observations down until the next observation occurs. This observation of a new record will have a header row and a line that will be like the above with new data. Eg Loan #: ID Account Name Actual Name. Here is a shorten example. Please see attachment for full structure. Thanks
Loan #: | 04-0026 | Account Name: | Chew Dewey Barbeque | |||
Page #: 11 | ||||||
Date | Action | Amount | Disbursed | Capitalized | Recovery | Balance |
07/09/2004 | Commitment | 1000 | 0 | 0 | 0 | 0 |
07/09/2004 | Advance: Principal | 1000 | 1000 | 0 | 0 | 1000 |
Loan Totals | 1000 | 0 | 0 | 0 | ||
Loan #: | 04-0037 | Account Name: | Salmon River Plaza | |||
07/12/2004 | Commitment | 27500 | 0 | 0 | 0 | 0 |
07/12/2004 | Advance: Principal | 27500 | 27500 | 0 | 0 | 27500 |
08/11/2004 | Payment: Customized | 600.8 | 0 | 0 | 0 | 27059.62 |
09/22/2004 | Payment: Customized | 545.8 | 0 | 0 | 0 | 26734.81 |
Is the file in spreadsheet or text form? If it is text, then you can probably write code that looks for the various items you want. E.g., each loan section appears to start with the string "Loan #:" and all of the desired data elements appear to be the only records that start with a date.
Similarly, each loan's information appears to end with a record that starts with the string "Loan Totals".
The data is an excel format or a csv format. I imported it into the system as an csv.
In that case, it's really easy. Just replace the "cards;" statement and the data with
"infile "C:\directory\file.csv";
and you should be good to go.
Here's how I like to do these.
The first step converts the data elements you need into a SAS dataset. I got one working based on the data you posted; you'll probably need to fiddle with it based on your full data. The idea is to keep ID and account name from the header rows, and the variables you need from the transaction rows. Check that the data in "have" look okay.
Then in the second step, if the record is a header, just retain the ID and account name; they will automatically populate the transaction rows, which are the only ones output.
Tom
data have; /* Turn the report format into a SAS dataset */
length rowtype TransType $ 32 AccountName_in $ 64;
drop rowtype;
input; /* get a record, and hold it in internal variable _infile_ */
rowtype = _infile_ || repeat(" ", 32); /* Make sure we have at least 32 characters to work with */
if substr(rowtype, 1, 7) = "Loan #:" then
do;
ID_in = scan(_infile_, 2, '09'x, "O"); /* 09 is hexadecimal for a tab character */
AccountName_in = scan(_infile_, 4, '09'x, "O");
output;
end;
else if substr(rowtype, 1, 6) = "Page #" |
substr(rowtype, 1, 4) = "Date" |
substr(rowtype, 1, 11) = "Loan Totals" then
; /* we don't need anything from these lines */
else
do;
TransDate = scan(_infile_, 1, '09'x, "O");
TransType = scan(_infile_, 2, '09'x, "O");
TransAmount = input(scan(_infile_, 3, '09'x, "O"), best32.);
TransDisbursed = input(scan(_infile_, 4, '09'x, "O"), best32.);
TransCapitalized = input(scan(_infile_, 5, '09'x, "O"), best32.);
TransRecovery = input(scan(_infile_, 6, '09'x, "O"), best32.);
TransBalance = input(scan(_infile_, 7, '09'x, "O"), best32.);
output;
end;
cards;
Loan #: 04-0026 Account Name: Chew Dewey Barbeque
Page #: 11
Date Action Amount Disbursed Capitalized Recovery Balance
07/09/2004 Commitment 1000 0 0 0 0
07/09/2004 Advance: Principal 1000 1000 0 0 1000
Loan Totals 1000 0 0 0
Loan #: 04-0037 Account Name: Salmon River Plaza
07/12/2004 Commitment 27500 0 0 0 0
07/12/2004 Advance: Principal 27500 27500 0 0 27500
08/11/2004 Payment: Customized 600.8 0 0 0 27059.62
09/22/2004 Payment: Customized 545.8 0 0 0 26734.81
run;
data want;
set have;
length AccountName $ 64;
retain ID AccountName;
drop ID_in AccountName_in;
if ID_in ^= "" then
do; /* keep the id and account in our retained variables */
ID = ID_in;
AccountName = AccountName_in;
end;
else output; /* this is a transaction row, and we've retained the ID and account */
run;
Tom thanks for your response. Perhaps I am not asking for what I need correctly. I am having no difficulty in getting the data in SAS and creating a data set. I am able to get the column built and filled in only for the rows that have Loan # on them. What I can not figure out is how to get the correct loan number and account name for all the observations below them. I would put something on the system to show you but it does not like my copying to this email box. I did attach a excel file which I see does not show where I am stuck very well. I am going to place a new file called sample.xls and hopefully this will show you better what I am trying to accomplish.. Sorry but I am at home and using a MAC and things are not working correctly.
Thanks for your help.
Depending on how you read the data in you may just need a retain statement.
You're looking for LOCF or last observation carried forward....
See the examples in this paper:
http://www.wuss.org/proceedings10/coders/2944_3_COD-Barbalau.pdf
Hi Thalo
Try running the code that I posted above. At the end of the first step, you should have a dataset named "have" that has your data. It has two types of records; one with the id and account name, and one with the transaction details. They are still in the same order as on your report; so you have the ID and name for "Chew Dewey Barbeque", and then the two transactions, and then the ID and name for "Salmon River Plaza", and then the four transactions.
What the second step does is: i) figure out is this an ID line or a transaction line; ii) if it's an ID line, keep (retain) the ID and name; iii) if it's a transaction line, output the record (which will have the retained ID and name). This will be dataset "want".
Give it a try, and see how close it is to what you want.
Tom
That is easy using conditional input .
data want; input @; if left(_infile_) eq: 'Loan #:' then do;input @'Account Name:' accountname & $40.;output; end; else if left(_infile_) eq: 'Loan Totals' then do;input @'Loan Totals' _dis _cap _rec _bal; output;end; else if left(_infile_) not in: ('Loan #:' 'Date' 'Page' 'Loan Totals') then do;input date : mmddyy10. Action $ Amount & $40. Disbursed Capitalized Recovery Balance ; output;end; format date mmddyy10.; cards; Loan #: 04-0026 Account Name: Chew Dewey Barbeque Page #: 11 Date Action Amount Disbursed Capitalized Recovery Balance 07/09/2004 Commitment 1000 0 0 0 0 07/09/2004 Advance: Principal 1000 1000 0 0 1000 Loan Totals 1000 0 0 0 Loan #: 04-0037 Account Name: Salmon River Plaza 07/12/2004 Commitment 27500 0 0 0 0 07/12/2004 Advance: Principal 27500 27500 0 0 27500 08/11/2004 Payment: Customized 600.8 0 0 0 27059.62 09/22/2004 Payment: Customized 545.8 0 0 0 26734.81 ; run;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.