BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
djbateman
Lapis Lazuli | Level 10

This is more of an Excel question than SAS, but since I'm altering Excel so that I can read in an Excel file, I figured I'd try this wonderful group.

 

I am given an Excel file from a Data Manager that has a list of subjects and a column for each visit with the table populated with expected visit dates.  There are a lot of values that have been copied from a table with formulas to a raw table, but that means that even though these are all date values, many of the fields have formula results such as 'DOVerror or 'FLC=ND.

 

When I read this in, SAS looks at the first several rows to determine how a variable should be characterized.  There are many of the visits where SAS correctly recognizes it as a column of dates and makes it a numeric variable with a date format and automatically makes blank those unwanted character strings.  But when there are a bunch of these blank fields in a given column, it thinks they are character strings and then treats the SAS data like a character string.

 

I know that that apostrophe in front of the text is a way to force Excel to treat the value as a character string.  (So, if I entered "001" and hit enter, Excel would convert it to 1 and treat it as a number.  But if I entered '001, Excel would store it as a character string with 3 characters.)  My initial thought was to just do a find/replace and change all "DOVerror" and "FLC=ND" to missing values, but I noticed that it leaves a blank field, but when you click on a blank cell, it still shows a lonely apostrophe.  I tried to change 'DOVerror to a blank value, but it said it couldn't find any such string.

 

So, my question is this: How can I quickly replace 'DOVerror or 'FLC=ND as missing values and have it treated like a numeric field instead of a blank character field?

 

I have attached a sample of my data so you can see in person what I'm talking about.  Hopefully it all makes sense.

1 ACCEPTED SOLUTION

Accepted Solutions
djbateman
Lapis Lazuli | Level 10

I developed a macro that reads in all the tabs as they are.  But I loop through each expected date variable (all date variables start with "SCR", "DAY", or "WEEK"), and I determine if it was properly read in as a numeric variable or incorrectly as a character variable.  I then loop through the data and convert all character variables into numeric date variables.  I had to add a little extra work because not all tabs had a character date in the same format.  But below is a summary of what I ended up doing.  Hopefully it will be helpful to someone somewhere.

 

%macro convert (study=);
      %let table=VX%sysfunc(tranwrd(&study.,%str(-),%str(_)));

      proc import out=&table. (rename=(subject_number=SUBJECT)) datafile="S:\cdm\Programming\Adhoc\COVID19\COVID19_Cumulative DET_16MAR20.xlsx" dbms=excelcs replace;
          scantext=yes;
          usedate=yes;
          scantime=yes;
            sheet="&study.";
      run;

      %let scr_exist=0;
      %let day_exist=0;
      %let week_exist=0;
      proc sql noprint;
            select '1' into :scr_exist from sashelp.vcolumn where libname='WORK' & memname="&table." & index(upcase(name),'SCR')>0;
            select '1' into :day_exist from sashelp.vcolumn where libname='WORK' & memname="&table." & index(upcase(name),'DAY')>0;
            select '1' into :week_exist from sashelp.vcolumn where libname='WORK' & memname="&table." & index(upcase(name),'WEEK')>0;
      quit;

      proc sql noprint;
            select left(put(count(name),3.)) into :ncharvar from sashelp.vcolumn where libname='WORK' & memname="&table." & type='char' & (index(upcase(name),'SCR')>0 | index(upcase(name),'DAY')>0 | index(upcase(name),'WEEK')>0 | index(upcase(name),'ETT')>0);
            select name into :charvar1-:charvar&ncharvar. from sashelp.vcolumn where libname='WORK' & memname="&table." & type='char' & (index(upcase(name),'SCR')>0 | index(upcase(name),'DAY')>0 | index(upcase(name),'WEEK')>0 | index(upcase(name),'ETT')>0);
            select label into :charlab1-:charlab&ncharvar. from sashelp.vcolumn where libname='WORK' & memname="&table." & type='char' & (index(upcase(name),'SCR')>0 | index(upcase(name),'DAY')>0 | index(upcase(name),'WEEK')>0 | index(upcase(name),'ETT')>0);
      quit;
      data &table. (drop=ett  %do i=1 %to &ncharvar.; &&charvar&i.._temp      %end;);
            set &table. (rename=(   %do i=1 %to &ncharvar.;
                                        &&charvar&i.=&&charvar&i.._temp
                                    %end;));
            %do i=1 %to &ncharvar.;
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'JAN','1');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'FEB','2');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'MAR','3');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'APR','4');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'MAY','5');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'JUN','6');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'JUL','7');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'AUG','8');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'SEP','9');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'OCT','10');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'NOV','11');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'DEC','12');
                  &&charvar&i.=mdy(input(scan(&&charvar&i.._temp,2,'/'),??best.),input(scan(&&charvar&i.._temp,1,'/'),??best.),input(scan(&&charvar&i.._temp,3,'/'),??best.));
                  format &&charvar&i. date9.;
                  label &&charvar&i.="&&charlab&i.";
            %end;
            if ett^=. then delete;
      run;
%mend convert;

View solution in original post

4 REPLIES 4
ballardw
Super User

You may want to consider Saving the Excel file to CSV. Then if you have set the display in Excel for a date value then the CSV will have the date value appear.

 

Second, after you import one of these as a CSV then save the generated Data step code that Proc import creates to read the file. You can copy this from the log into the editor and modify as needed. Consider whether you may see longer character variables than were in the specific file and set the informat to match longest expected. Then the next time you need to read one of these files you only change the infile to reference the new source CSV file and the output data set name if desired.

 

This approach will make all of the variables the same in each data set not have to deal with changing lengths of character variables or the changing data type. Proc Import only examines 20 rows of data to set properties for variables and as you have discovered may not guess as desired with some data files. If reading data with a data step then that is all controlled by you.

 

Big caveat with data coming from Excel: If the source has manually entered data you may have cells with the Alt-Enter used to create vertical spaces in the cells. Which when saved to CSV means line breaks. SOMETIMES the Infile option TERMSTR=CRLF may allow reading that problem.

djbateman
Lapis Lazuli | Level 10
Thanks for your reply. The only problem with using CSV is that my file has about 18 tabs in it, so I'd have to manually save all 18 tabs as CSV in order to keep my macro running smoothly. We might just have to go to the DM and see if she can make an update on her end on how the data is provided to us.
ballardw
Super User

@djbateman wrote:
Thanks for your reply. The only problem with using CSV is that my file has about 18 tabs in it, so I'd have to manually save all 18 tabs as CSV in order to keep my macro running smoothly. We might just have to go to the DM and see if she can make an update on her end on how the data is provided to us.

 

You might be able to use either the LIBNAME Excel or LIBNAME PCFILES to access the data. But depending on actual content the approach can be easy to a PITA. If the data is relatively nice, with single row column headers and data falling below then you have a chance not to get into too much hassle. Your libname would allow you to access each tab as a separate data set as source data and create a new SAS data set from that. There are SAS functions that would let you test whether a column is being treated as text (this week) or numeric and then you could conditionally execute an input statement to create the desired variable type in the output data set.

 

VTYPE(variablename) should return the type. VLENGTH may help with an input statement.

I can't provide any example for Libname Excel because our shop refuses to change to 64-Bit Excel requiring PCFILES. And the PCFILES examples I have are not "nice" data generally as they have multiline headers so I have to read the data with a data set to get what I need. An example using PCFILES:

libname cchdin PCFILES path="<a file name even though the option name is path>" access=readonly;


data cchd.rawweek (label='Weekly Raw data');
   /*actual data in the spread sheet doesn't start
     until the 7th line. Note that due
   to presence of "hidden" columns and possibly alignment
   bits some of columns in the spreadsheet
   skip actual data values when read with PCFILES
   also because one column with all asterisks we get the lovely variable
   ____________________
*/
   set cchdin.'cchd$'n (firstobs=7) ;
   informat
   YRSFN         $12.
   BABYNAME      $50.
   DOB           mmddyy10.
   DADNAME       $50.
   MOMNAME       $50.
   MDOB          mmddyy10.
   Address       $50.
   City          $50.
   State         $2.
   ZIP           $5.
   Facility      $100.
   Certifier     $50.
   TRANS         $100.
   SCRNCODE      $40.
/* custom informat statements removed as not helpful*/ ; format DOB mmddyy10. MDOB mmddyy10. ; yrsfn = input(f1,$12.); babyname = input(f2,$50.); DOB = input(f5,mmddyy10.); dadname = input(f6,$50.); momname = input(f8,$50.); MDOB = input(f11,mmddyy10.); address = input(f12,$50.); city = input(f13,$25.); State = input(f14,$2.); zip = input(f15,$5.); Facility = input(f16,$100.); Certifier= input(f17,$50.); Trans = input(f18,$100.); Scrncode = input(f19,$40.); hand1 = f20; foot1 = f21; hour1 = f22; hand2 = f23; foot2 = f24; hour2 = f25; hand3 = f26; foot3 = f27; hour3 = f28; drop f1-f8 f10-f28 ____________________ ;

One of the headaches here is the high likelihood of tab names that aren't useable as SAS names. So the names you see in the library, CCHD in my example, may mean that you have to use name literals. The "set cchdin.'cchd$'n (firstobs=7) ;" statement.

 

One of the projects I also ran into a bunch of "sets" that were not visible tabs and had to spend time working around those.

 

If all of your tabs have the same names, but different content within each Excel file, from file to file and you needed to read them with a data step then you could use a data step with call execute and an input data set with the names to generate the code. Not pretty but I've done it. If the tabs all have the same structure you MIGHT be able to get away with a list of names on a SET statement and use the INDSNAME option to capture the tab name (assuming it is helpful) as a variable. May. If you are having issues with Proc Import getting different variable types between tabs then you would have the same issue with the SET statement data set list of names and have type mismatch errors.

djbateman
Lapis Lazuli | Level 10

I developed a macro that reads in all the tabs as they are.  But I loop through each expected date variable (all date variables start with "SCR", "DAY", or "WEEK"), and I determine if it was properly read in as a numeric variable or incorrectly as a character variable.  I then loop through the data and convert all character variables into numeric date variables.  I had to add a little extra work because not all tabs had a character date in the same format.  But below is a summary of what I ended up doing.  Hopefully it will be helpful to someone somewhere.

 

%macro convert (study=);
      %let table=VX%sysfunc(tranwrd(&study.,%str(-),%str(_)));

      proc import out=&table. (rename=(subject_number=SUBJECT)) datafile="S:\cdm\Programming\Adhoc\COVID19\COVID19_Cumulative DET_16MAR20.xlsx" dbms=excelcs replace;
          scantext=yes;
          usedate=yes;
          scantime=yes;
            sheet="&study.";
      run;

      %let scr_exist=0;
      %let day_exist=0;
      %let week_exist=0;
      proc sql noprint;
            select '1' into :scr_exist from sashelp.vcolumn where libname='WORK' & memname="&table." & index(upcase(name),'SCR')>0;
            select '1' into :day_exist from sashelp.vcolumn where libname='WORK' & memname="&table." & index(upcase(name),'DAY')>0;
            select '1' into :week_exist from sashelp.vcolumn where libname='WORK' & memname="&table." & index(upcase(name),'WEEK')>0;
      quit;

      proc sql noprint;
            select left(put(count(name),3.)) into :ncharvar from sashelp.vcolumn where libname='WORK' & memname="&table." & type='char' & (index(upcase(name),'SCR')>0 | index(upcase(name),'DAY')>0 | index(upcase(name),'WEEK')>0 | index(upcase(name),'ETT')>0);
            select name into :charvar1-:charvar&ncharvar. from sashelp.vcolumn where libname='WORK' & memname="&table." & type='char' & (index(upcase(name),'SCR')>0 | index(upcase(name),'DAY')>0 | index(upcase(name),'WEEK')>0 | index(upcase(name),'ETT')>0);
            select label into :charlab1-:charlab&ncharvar. from sashelp.vcolumn where libname='WORK' & memname="&table." & type='char' & (index(upcase(name),'SCR')>0 | index(upcase(name),'DAY')>0 | index(upcase(name),'WEEK')>0 | index(upcase(name),'ETT')>0);
      quit;
      data &table. (drop=ett  %do i=1 %to &ncharvar.; &&charvar&i.._temp      %end;);
            set &table. (rename=(   %do i=1 %to &ncharvar.;
                                        &&charvar&i.=&&charvar&i.._temp
                                    %end;));
            %do i=1 %to &ncharvar.;
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'JAN','1');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'FEB','2');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'MAR','3');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'APR','4');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'MAY','5');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'JUN','6');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'JUL','7');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'AUG','8');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'SEP','9');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'OCT','10');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'NOV','11');
                  &&charvar&i.._temp=tranwrd(upcase(&&charvar&i.._temp),'DEC','12');
                  &&charvar&i.=mdy(input(scan(&&charvar&i.._temp,2,'/'),??best.),input(scan(&&charvar&i.._temp,1,'/'),??best.),input(scan(&&charvar&i.._temp,3,'/'),??best.));
                  format &&charvar&i. date9.;
                  label &&charvar&i.="&&charlab&i.";
            %end;
            if ett^=. then delete;
      run;
%mend convert;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2539 views
  • 0 likes
  • 2 in conversation