BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
MHines
Obsidian | Level 7

Hello, I am analyzing data for a project. Unfortunately, the data was provided in tables in PDF documents. We imported the PDF tables into Excel and some of the data split into multiple rows. We then read the data into SAS for analysis. Is there a way in SAS to combine data in different rows without a linking ID (the linking ID itself is split). I've attached a sample of fake data below that has the same structural issues I am looking to resolve. Each row should start with "Initiating Action:" - any data in a row that does not have this needs to be combined with the row above it. Any recommendations for how I should proceed with cleaning this are much appreciated.

 

What I have:

Event Type ID Case Type
Initiating Action: Pet X-000-CV-2024- Admin
  0123  
Initiating Action: X-000-CV-2024- Contract &
Complaint 4432 Debts
Initiating Action: Pet X-000-CV-2022- Admin
  0681  
Initiating Action: X-000-CV-2023 Miscellaneous
Complaint 0873 Issue

 

What it needs to look like:

Event Type ID Case Type
Initiating Action: Pet X-000-CV-2024-0123 Admin
Initiating Action: Complaint X-000-CV-2024-4432 Contract & Debts
Initiating Action: Pet X-000-CV-2022-0681 Admin
Initiating Action: Complaint X-000-CV-2023-0873 Miscellaneous issue
1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

This is a RENAME exercise which can be automated.  Uses a version of the UPDATE trick.  You can use SQL to generate the rename statement.  Notice the issue with spacing sometimes you want the space others not.  I update this UPDATE statement example using UPDATEMODE option. 

 

data have; 
   infile cards dsd dlm='09'x firstobs=2;
   length G_ 8;
   input  (Event ID Type) (:$64.);
   if event eq: 'Initiating' then G_ + 1;
   cards;
Event Type	ID	Case Type
Initiating Action: Pet	X-000-CV-2024-	Admin
 	0123	 
Initiating Action:	X-000-CV-2024-	Contract &
Complaint	4432	Debts
Initiating Action: Pet	X-000-CV-2022-	Admin
 	0681	 
Initiating Action:	X-000-CV-2023	Miscellaneous
Complaint	0873	Issue
;;;;
   run;
proc print;
   run; 
data want;
   update have(obs=0) have(rename=(Event=C_Event ID=C_ID Type=C_Type)) updatemode=nomissingcheck;
   by G_;
   array C_ C_:;
   array C Event ID Type;
   do over c_;
      c = catx(' ',of c C_);
      end;
   drop C_:;
   run;
proc print;
   run;

Capture.PNG

 

 

View solution in original post

2 REPLIES 2
ballardw
Super User

First, I feel your pain. I once received a bunch of "data" in the form of Fillable PDF forms and had multiple issues with getting values of of PDF into a usable form.

 

My first try would be to start over. How did you get the tables from PDF to Excel? Acrobat reader export tool?

Different PDF reader export? Copy and Paste?

Did you do anything in Excel like examine the data for hidden rows/columns? Merged cells? before attempting to read the data into SAS? How did you get the data from Excel to SAS?

 

With my project I had to manually fix a number of recurring hidden/merged cells and sometimes copy the data from one cell in Excel to another after getting rid of the merged cells and un-hiding all the hidden row/columns.

 

Clean up the Excel, then save as CSV and read that file and you may have more luck.

 

With LAG function you can get the values from a previous row of data but without an example of data in the from of a working data step I'm not going to try. For example, you are showing things that aren't normally valid variable names. 

However your example shows problems with just determining when and where a row was split. Which may mean you'll have better luck cleaning before reading into SAS if possible.

data_null__
Jade | Level 19

This is a RENAME exercise which can be automated.  Uses a version of the UPDATE trick.  You can use SQL to generate the rename statement.  Notice the issue with spacing sometimes you want the space others not.  I update this UPDATE statement example using UPDATEMODE option. 

 

data have; 
   infile cards dsd dlm='09'x firstobs=2;
   length G_ 8;
   input  (Event ID Type) (:$64.);
   if event eq: 'Initiating' then G_ + 1;
   cards;
Event Type	ID	Case Type
Initiating Action: Pet	X-000-CV-2024-	Admin
 	0123	 
Initiating Action:	X-000-CV-2024-	Contract &
Complaint	4432	Debts
Initiating Action: Pet	X-000-CV-2022-	Admin
 	0681	 
Initiating Action:	X-000-CV-2023	Miscellaneous
Complaint	0873	Issue
;;;;
   run;
proc print;
   run; 
data want;
   update have(obs=0) have(rename=(Event=C_Event ID=C_ID Type=C_Type)) updatemode=nomissingcheck;
   by G_;
   array C_ C_:;
   array C Event ID Type;
   do over c_;
      c = catx(' ',of c C_);
      end;
   drop C_:;
   run;
proc print;
   run;

Capture.PNG

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 345 views
  • 1 like
  • 3 in conversation