I have an Excel sheet that is essentially a report from an outside system. The header and the data starting points are inconsistent from month to month. I was hoping to see if there was a way to 'automate' this PROC IMPORT without having to open the file and see where the data begins each month.
The data looks like this:
FILTER | CHOICE(S) | |||||
Site | XXXX | (Site:1) | ||||
XXXX | (Site:1) | |||||
XXXX | (Site:1) | |||||
Visit Date | From 01/01/2016 To 06/30/2016 | |||||
IT MD_NO | 664 | |||||
2330 | ||||||
2631 | ||||||
3611 | ||||||
3778 | ||||||
4871 | ||||||
6661 | ||||||
10766 | ||||||
12238 | ||||||
12440 | ||||||
13404 | ||||||
17124 | ||||||
19968 | ||||||
20966 | ||||||
21245 | ||||||
31640 | ||||||
56340 | ||||||
70465 | ||||||
72580 | ||||||
73643 | ||||||
74561 | ||||||
75833 | ||||||
76428 | ||||||
76894 | ||||||
86711 | ||||||
90069 | ||||||
90360 | ||||||
95850 | ||||||
98760 | ||||||
FREQUENCY REPORT | ||||||
Site | IT MD_NO | Visit Date | Question | Very Good % | Very Good n | Total n |
'Hospital' | '00001' | '2/1/16 2/29/16' | Overall rating of care | 100 | 1 | 1 |
'Total' | '00001' | '2/1/16 2/29/16' | Overall rating of care | 100 | 1 | 1 |
'Hospital' | '00001' | '4/1/16 4/30/16' | Overall rating of care | 100 | 1 | 1 |
I don't need any of the data prior to the row that contains SITE|IT MD_NO|Visit Date...
Because the word 'SITE' is also found at the top of the report where I don't need the data, I didn't think it would be usable.
Any assistance would be greatly appreciated.
This is a very good example why excel should not be used as data source. Due to the things above the relevant data, sas will most likely fail to guess the correct data type and you will end up with more than one proc/datastep to fix issues.
You can find the data:
Hardly tested code:
proc import
datafile="yourfile.xlsx"
dbms=xlsx
out=work.crap
replace
;
getnames=no;
run;
data _null_;
set work.crap;
length rangeStart rangeEnd 8 range $ 20;
retain range:;
if A = "Site" and B = "IT MD_NO" /* ...*/ then do;
rangeStart = _n_;
end;
if not missing(rangeStart) and cmiss(A, B /* ... */) = 2 then do;
rangeEnd = _n_ - 1;
range = cats("Sheet1$A", rangeStart, ":G", rangeEnd);
call symputx("range", range);
put range=;
stop;
end;
run;
proc import
datafile="yourfile.xlsx"
dbms=xlsx
out=work.cool
replace
;
getnames = yes;
range= "&range";
run;
Edit: One thing to add: you need option validvarname set to any.
Not really, you have hit onoe of the main issues with using Excel. Excel is unstructured and hence a very bad choice for data transfer. I would go back to the source and ask them to provide a useable data in a proper data transfer file - CSV, XML etc.
Otherwise, you have three options:
1) Manually copy out the data you want, and save it as a decent file format.
2) Use Excel VBA to derive your data and save it to a decent file format
3) Import as is and then post process what is imported in a datastep
This is a very good example why excel should not be used as data source. Due to the things above the relevant data, sas will most likely fail to guess the correct data type and you will end up with more than one proc/datastep to fix issues.
You can find the data:
Hardly tested code:
proc import
datafile="yourfile.xlsx"
dbms=xlsx
out=work.crap
replace
;
getnames=no;
run;
data _null_;
set work.crap;
length rangeStart rangeEnd 8 range $ 20;
retain range:;
if A = "Site" and B = "IT MD_NO" /* ...*/ then do;
rangeStart = _n_;
end;
if not missing(rangeStart) and cmiss(A, B /* ... */) = 2 then do;
rangeEnd = _n_ - 1;
range = cats("Sheet1$A", rangeStart, ":G", rangeEnd);
call symputx("range", range);
put range=;
stop;
end;
run;
proc import
datafile="yourfile.xlsx"
dbms=xlsx
out=work.cool
replace
;
getnames = yes;
range= "&range";
run;
Edit: One thing to add: you need option validvarname set to any.
Thank you both for the responses. I completely agree that Excel is a crappy data soure and is my LEAST favorite vehicle for receiving data from clients. However, sometimes the client just can't get the data any other way and we have to do some sort of manipulation. Most of the tips and tricks I've learned using SAS are from loading and manipulating data like this.
Thank you again for your quick and easily consumable responses!
Brian
Hi Team
It is fruitless to complain about excel, SAS is a knat on the dehind of excel. The number of excel users dwarfs SAS.
Here is a solution that scans the entire excel sheet (all character columns) for a string.
This was posted a long time ago in SAS-L. Does not exactly answer the question, but may be usefull. I suspect you can replace the libname with 'proc import'. There are aslo mant R and Python solutuions.
It does require your excel configuarion to use row column addressing, which I find is easier to program.
Finding the excel cell reference that contains a string (ie R10C1)
works best in the old dm text editor?
This editor has the nice property of eliminating (UE, EE, EG and SAS Studio options).
http://goo.gl/kJVveA
Will post R solution later (RDCOM?)
HAVE Excel sheet
Up to 40 obs from xls.class$ total obs=19
Obs NAME SEX AGE HEIGHT WEIGHT
1 Alfred M 14 69.0 112.5
2 Alice F 13 56.5 84.0
3 Barbara F 13 65.3 98.0
4 Carol F 14 62.8 102.5
5 Henry M 14 63.5 102.5
6 James M 12 57.3 83.0
7 Jane F 12 59.8 84.5
8 Janet F 15 62.5 112.5
9 Jeffrey M 13 62.5 84.0
10 John M 12 59.0 99.5
WANT the cell reference for John from excel
without loading the entire workbook into SAS
Variable Value Excel Cell
Name John R10C1
SOLUTION
*create an excel sheet;
libname xls "d:/xls/class.xls";
data xls.class;
set sashelp.class;
;run;quit;
libname xls clear;
* Need position (column number) in PDV for
all char variables;mm
%put &=pdvchr;
/*
pdvchr= 1,2
*/
* it is a shame that we cannot use some form
of column number ro reference variables.
Note: R can do this and it is a major strength;
data celref;
set xls.class;
array chrvars[*] _character_;
array chrpos[2] _temporary_ (&pdvchr);
do varnums=1 to dim(chrvars);
if chrvars[varnums]='John' then do;
xlscel=cats('R',put(_n_,8.),'C',put(chrpos[varnums],8.));
nam=vname(chrvars[varnums]);
put nam chrvars[varnums] xlscel;
stop;
end;
end;
run;quit;
Name John R10C1
I'm not going to attempt code but if the file data of interest is in the same format then I would be tempted to save to CSV. Then I can write code that will find the line where IT MD_NO is the second "word" or even the "visit date".
An input @ and Scan(_Infile_) likely to be helpful within a Do Until loop.
I will also say that I wouldn't be terrible fond of "visit date" (name singular) containing what appears to be a range but that's easy to pull apart.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.