Dear Members,
My final goal is to merge two tables, one is from Excel, and the other one is from Word; however, I am having difficulty in importing a table from Word into SAS. In Word, you can have hardenters in the cells. For example, in the Address column of the tables below (Table 1), there is a hardenter after San Francisco. I tried to copied this table into Excel, it turned out there were multiple rows for ID 1. The whole table was messed up so that I cannot merge into the Excel table.
Are there any ways to read Word table into SAS directly without messing up each row? Or if there are any ways to convert Word table into Excel so that I can read them into SAS, and then merge them?
Table 1: Word Table
ID | Name | Address | dependents | Relationship |
1 | Mary | San Francisco CA - 94115 | Bob Jenny | Son Daughter |
2 | Robert | 234 Stress Oakland CA -94623 | JJ Christina | Daughter Daughter |
5 | Lucia | Los Angeles |
Table 2: Excel Table
ID | Hours | Pay rate |
2 | 40 | 40 |
3 | 40 | 35 |
4 | 35 | 32 |
5 | 40 | 50 |
Thank you all, Any supplementary information is highly appreciated.
Belle
You only missed one step. On the replace screen, under find, click on 'more' at the bottom left of the screen. Then click on 'special'. The click on 'Paragraph Mark' which, on my version of Word, is the first option.
Then, do like you did, indicating ~ as the replacement character, click on replace all, then when it finishes indicate that, NO, you don't want it to continue with the rest of the document.
Hi Belle,
Try converting your word document table to an excel sheet. Please go through the following link. It has enough information to convert word table to excel.
http://www.ehow.com/how_2096574_convert-microsoft-word-table-excel.html
I have tried converting a word table to excel, it worked for me.
Hope this helps...!
Thanks
Dhanasekaran R
Hi Dhanasekaran,
Thanks for the information; however it does not work for my case because it becomes mutiple rows as I mentioned. Table will be messed up after reading into SAS.
Here is what I got after I converted it into Excel.
Thanks again for the help.
Belle,
If the word file can be highlighted (i.e., select all) and copied to your clipboard (i.e., Ctrl-c), I may have a solution for you. It is the topic of a presentation that I'll be giving at next year's SGF, but we have never tested it on the particular kind of file you describe.
The powerpoint, paper and code are available at http://torsas.ca/page18.php which is the website for my local user group, namely the Toronto Area SAS Society.
If you don't mind, I'd really like to test the code on your particular file. If the word document can be shared, please send me a copy at art297 at rogers dot com.
Art
Hi Art,
I cannot share the original tables with you because of company's policy. I wish I could share it.
Actually the table I showed here were fake tables. They contain the information/problem that I am facing. I'll send you these two tables.
I really appreciate your help, hopefully this problem can be solved.
Belle
Belle,
First the good news: I was able to import your data. You may not agree with how I eventually structured the file, but that part should be relatively easy for you to change. Since SAS doesn't have merged cells, I split merged cells into multiple records.
Obs ID Name Address dependents Relationship
1 1 Mary San Francisco CA - 94115 Bob Son
2 1 Mary San Francisco CA - 94115 Jenny Daughter
3 2 Robert 234 Stress Oakland CA -94623 JJ Daughter
4 2 Robert 234 Stress Oakland CA -94623 Christina Daughter
5 5 Lucia Los Angeles
It definitely wasn't as straight forward as I would have liked, but only took seconds to accomplish once I figured out what the problem was. MS Word doesn't really have soft returns in its merged cells .. they copy and paste as non-merged cells with hard carriage returns. Thus the first task that had to be accomplished was to highlight the table in Word and replace all Paragraphs with tildes (i.e., ~). If you don't know how to do that let me know and I'll describe the steps That entire process literally took about 10 seconds.
Then I highlighted and copied the now modified table and ran the code I gave you the link for this afternoon. The following are the settings I used at the top of the code:
options NOQUOTELENMAX;
options datestyle=mdy;
filename clippy clipbrd;
filename revised temp;
%let transpose=NO;
%let columns=;
%let rows=;
%let look_for=;
%let skip_lines=;
%let hrows=1;
%let spaces=" ";
%let first_data_row=2;
%let var_renames=;
%let var_labels=;
%let var_share=;
%let var_prefix=;
%let var_suffix=;
%let var_drop=;
%let var_upcase=;
%let var_missing=;
%let var_formats=;
%let var_informats=;
%let var_units=;
%let guessingrows=;
%let outfile=have;
I.e., I didn't have to change anything or set anything.
Then I ran the following datastep:
data want (drop=_:);
length dependents $15;
length Relationship $15;
set have (rename=(dependents=_dep
Relationship=_rel));
address=translate(address," ","~");
do _i=1 to max(countw(_dep,"~"),countw(_Rel,"~"));;
dependents=scan(_dep,_i,"~");
Relationship=scan(_rel,_i,"~");
output;
end;
run;
I ran one extra datastep as I figured that you wanted the variables in the same order as you had them in the original file:
data want;
retain ID Name Address dependents Relationship;
set want;
run;
Art
Hi Art,
Thanks a lot for trying my tables.
I tried to follow your steps, but I did encounter a problem on replacing all Paragraphs with tildes (i.e., ~). Here is what i did: Highlight whole table -> click replace -> type "~" in REPLACE option; My question is what should I put in FIND option? Did I select wrong option?
Thanks again.
You only missed one step. On the replace screen, under find, click on 'more' at the bottom left of the screen. Then click on 'special'. The click on 'Paragraph Mark' which, on my version of Word, is the first option.
Then, do like you did, indicating ~ as the replacement character, click on replace all, then when it finishes indicate that, NO, you don't want it to continue with the rest of the document.
Thanks Art, it works well.
Belle,
You didn't mark the question as having been answered, so others will probably still spend time trying to come up with alternative solutions. If that is what you want, leave it as being unanswered, otherwise you really should click on the icon that indicates that the question has been answered.
Regardless, I'm still curious about what worked well. I presume that you meant the find/replace in Word. I am mostly interested in whether you were able to import the file and, if so, whether you used the code I proposed or proc import.
Hi Art,
Sorry for not marking the questions since I am very new to this discussion board. I have marked it.
Yes, I meant the find/replace in word worked well. Now, I have followed your instruction without changing anything; however, I got error message which said '
File WORK.HAVE.DATA does not exist'. I think I did not run the Marco correctly. How did you set up the values for the Macros (%flipfile; %expandr (type,string); %filarray (type,string);)? Below are the step that I tried after i replaced paragraph:
Step 1: Copy the Macro in to SAS
(begining of the macro)
options NOQUOTELENMAX;
options datestyle=mdy;
filename clippy clipbrd;
.
.
.
%macro filarray (type,string);
if scan("&string.",i," ") ne "" then
&type(scan(scan("&string.",i," "),1,"~"))=
scan(scan("&string.",i," "),2,"~");
%mend filarray;
(end of the marco)
Step 2: Highlight the word table -> Ctrl + C
Step 3: Run the Marco in Step 1
Step 4: Run your codes below
(begining of the codes)
data want (drop=_:);
length dependents $15;
length Relationship $15;
set have (rename=(dependents=_dep
Relationship=_rel));
address=translate(address," ","~");
do _i=1 to max(countw(_dep,"~"),countw(_Rel,"~"));;
dependents=scan(_dep,_i,"~");
Relationship=scan(_rel,_i,"~");
output;
end;
run;
(end of the codes)
Sorry for asking this Marco question, I am still learning the Marco.
Thanks
The code was designed so that one doesn't have to know or understand macros. The only critical parts of the code are defining the macro variables up front.
Specifically, the code was designed so that one simply copies the entire set of code (as posted on the TASS (i.e., www.torsas.ca) website, ensue that all of the macro variables are set correctly, copy the file you want to import, then press run.
The most critical aspect that might have to be changed, as it is system dependent, is the macro variable
%let spaces=" "; *number of consecutive spaces that should be translated
to represent a horizontal tab;
In on computer that is 4 spaces that have to be included between the pair of double quotes, but it may be 6 on some systems.
I would need to see the code you actually submitted, and the log, in order to answer your question.
However, at this point you don't really need my code to do the job: proc import should work quite nicely for you as well.
Hi Art,
I am confused. I ran the three macros (%flipfile; %expandr (type,string); %filarray (type,string);). The questions is if I need to set up the values for each macro. for example,
(1) When I run %flipfile; do I need to assign vaules?
(2) For %expandr (type,string); what do type and string equal to?
In my first ran, I did not assign anything for them. Here are the messages from log:
...... (run first two macros)
......
1227 %macro filarray (type,string);
1228 if scan("&string.",i," ") ne "" then
1229 &type(scan(scan("&string.",i," "),1,"~"))=
1230 scan(scan("&string.",i," "),2,"~");
1231 %mend filarray;
1232
1233
1234
1235 data want (drop=_:);
1236
1237 length dependents $15;
1238
1239 length Relationship $15;
1240
1241 set have (rename=(dependents=_dep
1242
1243 Relationship=_rel));
ERROR: File WORK.HAVE.DATA does not exist.
1244
1245 address=translate(address," ","~");
1246
1247 do _i=1 to max(countw(_dep,"~"),countw(_Rel,"~"));;
1248
1249 dependents=scan(_dep,_i,"~");
1250
1251 Relationship=scan(_rel,_i,"~");
1252
1253 output;
1254
1255 end;
1256
1257 run;
NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
1245:21 1247:25 1247:42 1249:21 1251:23
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
1245:11
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0
observations and 3 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
Thanks for the help.
Belle
Hi Art,
Another question, when you mentioned that I did not need to use your codes. Did you mean marcos, or you mean your codes start from data want (drop=_:);? Also how should I use proc import? Do I still import table from Word to SAS by using proc import? Sorry for all the confusion.
Thanks again
Belle
The proc import method I used included:
1. Replacing the paragraph marks with ~s in word
2. Copying the table and pasting it into Excel, then saving it as a.xls in Excel97 format
3. Running the following code:
proc import datafile='c:\art\a.xls'
out=have
dbms=xls
replace;
getnames=YES;
datarow=2;
run;
data want (drop=_:);
length dependents $15;
length Relationship $15;
set have (rename=(dependents=_dep
Relationship=_rel));
address=translate(address," ","~");
do _i=1 to max(countw(_dep,"~"),countw(_Rel,"~"));;
dependents=scan(_dep,_i,"~");
Relationship=scan(_rel,_i,"~");
output;
end;
run;
data want;
retain ID Name Address dependents Relationship;
set want;
run;
But please send me, off line (art297 at rogers dot com) a copy of the log you got when you ran the code that I initially proposed.
Art
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.