DATA Step, Macro, Functions and more

copying observations without sorting data for data cleaning

Reply
Occasional Contributor
Posts: 9

copying observations without sorting data for data cleaning

Hello I am trying to clean up data from a system generated report. The data has header or unique identification by observations but not a way to identify each observation with an id or account. I want to copy the observation and place it in a column. The sample spreadsheet shows the structure of the data.  For my data I want to copy information in the row like:

Loan #:04-0026Account Name:Chew Dewey Barbeque

The id would go in the id column and Chew Dewey Barbeque would go in the account  name column. Then the column would be filled with this data until the end of the record. It is easy to make a column that copies into ID and Account name the observation. But I can not figure out a way to copy this data for all the other relevant observations that occur below it, as I can not sort the data yet being there is no unique identification yet. Does anyone know of a way to copy the observations down until the next observation occurs. This observation of a new record will have a header row and a line that will be like the above with new data.  Eg Loan #:  ID Account Name Actual Name.  Here is a shorten example. Please see attachment for full structure. Thanks

Loan #:04-0026Account Name:Chew Dewey Barbeque
Page #: 11
DateActionAmountDisbursedCapitalizedRecoveryBalance
07/09/2004Commitment10000000
07/09/2004Advance: Principal10001000001000
Loan Totals1000000
Loan #:04-0037Account Name:Salmon River Plaza
07/12/2004Commitment275000000
07/12/2004Advance: Principal27500275000027500
08/11/2004Payment: Customized600.800027059.62
09/22/2004Payment: Customized545.800026734.81
PROC Star
Posts: 7,492

Re: copying observations without sorting data for data cleaning

Is the file in spreadsheet or text form?  If it is text, then you can probably write code that looks for the various items you want.  E.g., each loan section appears to start with the string "Loan #:" and all of the desired data elements appear to be the only records that start with a date.

Similarly, each loan's information appears to end with a record that starts with the string "Loan Totals".

Occasional Contributor
Posts: 9

Re: copying observations without sorting data for data cleaning

The data is an excel format or a csv format. I imported it into the system as an csv.

PROC Star
Posts: 1,167

Re: copying observations without sorting data for data cleaning

In that case, it's really easy. Just replace the "cards;" statement and the data with

"infile "C:\directory\file.csv";

and you should be good to go.

PROC Star
Posts: 1,167

Re: copying observations without sorting data for data cleaning

Here's how I like to do these.

The first step converts the data elements you need into a SAS dataset. I got one working based on the data you posted; you'll probably need to fiddle with it based on your full data. The idea is to keep ID and account name from the header rows, and the variables you need from the transaction rows. Check that the data in "have" look okay.

Then in the second step, if the record is a header, just retain the ID and account name; they will automatically populate the transaction rows, which are the only ones output.

Tom

data have; /* Turn the report format into a SAS dataset */
length rowtype TransType $ 32 AccountName_in $ 64;
drop rowtype;
input; /* get a record, and hold it in internal variable _infile_ */
rowtype = _infile_ || repeat(" ", 32); /* Make sure we have at least 32 characters to work with */

if substr(rowtype, 1, 7) = "Loan #:" then
  do;
   ID_in = scan(_infile_, 2, '09'x, "O"); /* 09 is hexadecimal for a tab character */
   AccountName_in = scan(_infile_, 4, '09'x, "O");
   output;
  end;
else if substr(rowtype, 1, 6) = "Page #" |
  substr(rowtype, 1, 4) = "Date" |
  substr(rowtype, 1, 11) = "Loan Totals" then

  ; /* we don't need anything from these lines */
else
  do;
   TransDate = scan(_infile_, 1, '09'x, "O");
   TransType = scan(_infile_, 2, '09'x, "O");
   TransAmount = input(scan(_infile_, 3, '09'x, "O"), best32.);
   TransDisbursed = input(scan(_infile_, 4, '09'x, "O"), best32.);
   TransCapitalized = input(scan(_infile_, 5, '09'x, "O"), best32.);
   TransRecovery = input(scan(_infile_, 6, '09'x, "O"), best32.);
   TransBalance = input(scan(_infile_, 7, '09'x, "O"), best32.);
   output;
  end;

cards;
Loan #: 04-0026 Account Name:  Chew Dewey Barbeque 
Page #: 11     
Date Action Amount Disbursed Capitalized Recovery Balance
07/09/2004 Commitment 1000 0 0 0 0
07/09/2004 Advance: Principal 1000 1000 0 0 1000
Loan Totals 1000 0 0 0 
Loan #: 04-0037 Account Name:  Salmon River Plaza 
07/12/2004 Commitment 27500 0 0 0 0
07/12/2004 Advance: Principal 27500 27500 0 0 27500
08/11/2004 Payment: Customized 600.8 0 0 0 27059.62
09/22/2004 Payment: Customized 545.8 0 0 0 26734.81
run;

data want;
set have;
length AccountName $ 64;
retain ID AccountName;
drop ID_in AccountName_in;

if ID_in ^= "" then
  do; /* keep the id and account in our retained variables */
   ID = ID_in;
   AccountName = AccountName_in;
  end;
else output; /* this is a transaction row, and we've retained the ID and account */
run;

Occasional Contributor
Posts: 9

Re: copying observations without sorting data for data cleaning

Tom thanks for your response. Perhaps I am not asking for what I need correctly. I am having no difficulty in getting the data in SAS and creating a data set. I am able to get the column built and filled in only for the rows that have Loan # on them. What I can not figure out is how to get the correct loan number and account name for all the observations below them. I would put something on the system to show you but it does not like my copying to this email box. I did attach a excel file which I see does not show where I am stuck very well. I am going to place a new file called sample.xls and hopefully this will show you better what I am trying to accomplish.. Sorry but I am at home and using a MAC and things are not working correctly.

Thanks for your help.

Super User
Posts: 19,878

Re: copying observations without sorting data for data cleaning

Depending on how you read the data in you may just need a retain statement.

You're looking for LOCF or last observation carried forward....

See the examples in this paper:

http://www.wuss.org/proceedings10/coders/2944_3_COD-Barbalau.pdf

PROC Star
Posts: 1,167

Re: copying observations without sorting data for data cleaning

Hi Thalo

Try running the code that I posted above. At the end of the first step, you should have a dataset named "have" that has your data. It has two types of records; one with the id and account name, and one with the transaction details. They are still in the same order as on your report; so you have the ID and name for "Chew Dewey Barbeque", and then the two transactions, and then the ID and name for "Salmon River Plaza", and then the four transactions.

What the second step does is: i) figure out is this an ID line or a transaction line; ii) if it's an ID line, keep (retain) the ID and name; iii) if it's a transaction line, output the record (which will have the retained ID and name). This will be dataset "want".

Give it a try, and see how close it is to what you want.

Tom

Super User
Posts: 10,046

Re: copying observations without sorting data for data cleaning

That is easy using conditional input .

data want;
input @;
if left(_infile_) eq: 'Loan #:' then do;input  @'Account Name:' accountname & $40.;output; end;
 else if  left(_infile_) eq: 'Loan Totals' then do;input  @'Loan Totals' _dis _cap _rec _bal; output;end;
  else if left(_infile_) not in: ('Loan #:' 'Date' 'Page' 'Loan Totals')  then do;input  date : mmddyy10. Action $     Amount & $40.     Disbursed     Capitalized     Recovery     Balance ; output;end;

format date mmddyy10.;
cards;
Loan #:     04-0026     Account Name:     Chew Dewey Barbeque
Page #: 11
Date     Action     Amount     Disbursed     Capitalized     Recovery     Balance
07/09/2004     Commitment     1000     0     0     0     0
07/09/2004     Advance: Principal     1000     1000     0     0     1000
Loan Totals     1000     0     0     0
Loan #:     04-0037     Account Name:     Salmon River Plaza
07/12/2004     Commitment      27500     0     0     0     0
07/12/2004     Advance: Principal      27500     27500     0     0     27500
08/11/2004     Payment: Customized      600.8     0     0     0     27059.62
09/22/2004     Payment: Customized      545.8     0     0     0     26734.81
;
run;

Ksharp

Ask a Question
Discussion stats
  • 8 replies
  • 474 views
  • 0 likes
  • 5 in conversation