I am programming in SAS 9.4 for Windows and reading a PDF document which has variable labels with values. It looks something like:
Name: Sue
Today's Date: 04/25/2018
Age: 54
I have been able to read the PDF document and get a dataset with 3 records that looks like:
FieldName values labels
Name Sue Name
Todays_Date 04/25/2018 Today's Date
Age 54 Age
I would like to create a dataset with 1 record that looks like:
Name Todays_Date Age
Sue 04/25/2018 54
I am looking for ideas on how to do this.
I tried writing code that used macros but the macro is just putting in the line FieldName = values instead of the 3 lines
Name = values, Todays_Date = values and Age = 54:
data Have; length FieldName $ 20 values $ 20 labels $ 20; FieldName = 'Name'; values = 'Sue'; labels = 'Name'; output; FieldName = 'Todays_Date'; values = '04/25/2018'; labels = "Today's Date"; output; FieldName = 'Age'; values = '54'; labels = 'Age'; output; run; %macro Assign(FN, Value, Labels); &FN. = &Value.; %put &FN.; %put &Value.; %mend; data Want; set Have; %Assign(FieldName, values, labels); run; proc print data = Have; run; proc print data = Want; run;
This make a set that looks like your shown output:
proc transpose data=have out=want (drop=_name_); id fieldname; var values; run;
However due to the way Transpose uses ID variables this would not be very extendable.
Also this does not change any variable types, so the data and age still strings.
PDF is a poor data interchange medium. If you have access to Adobe Acrobat Pro you might try selecting the data in that program and then using the export to file feature to create something bit more amenable.
I'm assuming this is only sample data and you're after a generic solution which works for any number of columns and rows.
To fully automate this you would need additional information like data type (num or char), length and at least for dates also a format.
If the data is as it is then what you could do is to first create a text file (data _null_ with put statements) and then convert this text file to a SAS table using Proc Import (=use SAS to guess and assign the missing information like data type).
Writing out to file and reading back in was another good alternative.
This make a set that looks like your shown output:
proc transpose data=have out=want (drop=_name_); id fieldname; var values; run;
However due to the way Transpose uses ID variables this would not be very extendable.
Also this does not change any variable types, so the data and age still strings.
PDF is a poor data interchange medium. If you have access to Adobe Acrobat Pro you might try selecting the data in that program and then using the export to file feature to create something bit more amenable.
Complete agreement on PDF being very bad for data transfer, but it is what we have. We are working on reading the XML files that come with the PDFs in parallel. But, they have their own set of problems.
The Proc Transpose worked like a charm. I will need a second data step to structure all the data (data types, labels, etc.), but this gets the data roughly right.
I did go down a couple of other paths but they were very convoluted.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.