Import excel file and read specific column header

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 81
Accepted Solution

Import excel file and read specific column header

Hi,

I am a beginner to intermediate SAS user and need help.

 

Here is a sample excel file that I want to import in SAS:

This is a sample file  
   
This is a sample file  
   
This is a sample file  
This is a sample file  
School A  
ClassGradePass
1AYes
2BYes
3ENo
This is a sample file  
   
This is a sample file  
   
This is a sample file  
This is a sample file  
School B  
ClassGradePass
4AYes
5ENo
6ENo

 

I want SAS to read the data every time it hits the column header "Class" so the final data looks like a clean SAS data set with single column header and all the data and delete the rows in between:

 

ClassGradePass
1AYes
2BYes
3ENo
4AYes
5ENo
6ENo

 

Thank you


Accepted Solutions
Solution
‎04-06-2018 11:47 PM
Respected Advisor
Posts: 4,736

Re: Import excel file and read specific column header

@AZIQ1

I'm giving basically the same answer as @PGStats but also posting some code creating data as it would look like after a proc import.

 

/*proc import datafile="c:\temp\sample.xlsx"*/
/*  dbms=xlsx*/
/*  out=work.source replace;*/
/*  sheet='sheet1';*/
/*  getnames=no;*/
/*run;*/

DATA WORK.source;
    INFILE DATALINES4
        DLM='7F'x
        truncover
        DSD ;
    INPUT
        a               : $30.
        b               : $30.
        c               : $30. ;
DATALINES4;
This is a sample file   
   
This is a sample file   
   
This is a sample file   
This is a sample file   
School A   
ClassGradePass 
1AYes 
2BYes 
3ENo 
This is a sample file   
   
This is a sample file   
   
This is a sample file   
This is a sample file   
School B   
ClassGradePass
4AYes
5ENo
6ENo
;;;;
run;

data want;
  set source;
  retain headerFlg 0;
  if find(a,'This is a sample file','ti') then headerFlg=0;
  if headerFlg=1 then
    do;
      ClassID=input(a,best32.);
      Grade=input(b,$1.);
      Pass=input(c,$3.);
      output;
      keep ClassID Grade Pass;
    end;

  if find(a,'class','ti') then headerFlg=1;

run;

 

N.B: For your real data you will need to find text tokens where you're 100% sure that they are sufficient as indicators where your wanted data starts and where it ends.

You probably need also to implement some logic which verifies on which column your data starts (unless you can be certain it's always on column A).

View solution in original post


All Replies
Super User
Posts: 23,754

Re: Import excel file and read specific column header

Is your file Excel or CSV?

Are you doing this multiple times or once for a single file?

What have you tried so far that doesn't work, so we don't suggest things you've already tried.

Frequent Contributor
Posts: 81

Re: Import excel file and read specific column header

Thank you for your response.
It is in excel
It is for multiple files that are similar
I haven't tried anything, I know range specification but that wont work here since the row can be any number.
Super User
Posts: 23,754

Re: Import excel file and read specific column header

So there's no consistency to the data at all? They're not using the same ranges across the tables?

 

If that's the case, I would consider converting it to CSV and then using a manual data step to parse the code into what you need. Not particularly fun but it gets the job done.

Frequent Contributor
Posts: 81

Re: Import excel file and read specific column header

Thank you, but there are hundreds of files.
Super User
Posts: 23,754

Re: Import excel file and read specific column header


@AZIQ1 wrote:
Thank you, but there are hundreds of files.

Ok, isn't that why your using a programming language so you can automate/generalize your process?

 

There's a VBS script I posted here earlier this week that will convert all XLSX files in a folder into CSV

Then write the program once for a single file, test it. Once it's working, its trivial to run it for all by either modifying the program. 

 

We can't provide code for a problem where no data has been provided except a general idea for a single file. 

 

 

 

Solution
‎04-06-2018 11:47 PM
Respected Advisor
Posts: 4,736

Re: Import excel file and read specific column header

@AZIQ1

I'm giving basically the same answer as @PGStats but also posting some code creating data as it would look like after a proc import.

 

/*proc import datafile="c:\temp\sample.xlsx"*/
/*  dbms=xlsx*/
/*  out=work.source replace;*/
/*  sheet='sheet1';*/
/*  getnames=no;*/
/*run;*/

DATA WORK.source;
    INFILE DATALINES4
        DLM='7F'x
        truncover
        DSD ;
    INPUT
        a               : $30.
        b               : $30.
        c               : $30. ;
DATALINES4;
This is a sample file   
   
This is a sample file   
   
This is a sample file   
This is a sample file   
School A   
ClassGradePass 
1AYes 
2BYes 
3ENo 
This is a sample file   
   
This is a sample file   
   
This is a sample file   
This is a sample file   
School B   
ClassGradePass
4AYes
5ENo
6ENo
;;;;
run;

data want;
  set source;
  retain headerFlg 0;
  if find(a,'This is a sample file','ti') then headerFlg=0;
  if headerFlg=1 then
    do;
      ClassID=input(a,best32.);
      Grade=input(b,$1.);
      Pass=input(c,$3.);
      output;
      keep ClassID Grade Pass;
    end;

  if find(a,'class','ti') then headerFlg=1;

run;

 

N.B: For your real data you will need to find text tokens where you're 100% sure that they are sufficient as indicators where your wanted data starts and where it ends.

You probably need also to implement some logic which verifies on which column your data starts (unless you can be certain it's always on column A).

Valued Guide
Posts: 596

Re: Import excel file and read specific column header

Hi,

 

If this is your situation the best way is convert the file to CSV as others mentioned and then using DATA STEP you can read as you need. 

DATA WORK.SAS_Import;
INFILE '/SAS_Import.csv' dlm=',' dsd ;
    INPUT @;
	IF prxmatch("m/this|sample|School|class| /oi",_INFILE_) or missing(compress(_infile_,',')) then delete;
	else INPUT CLASS $ GRADE $ PASS $;
RUN;

There is another way, for this you need to understand how your data is imported and then convert the dataset (Cleansing data).

 

proc import datafile='/SAS_Import.xlsx'
dbms=xlsx out=Have replace;
Getnames=no;
run;

data want;
set have(rename=(A=CLASS B=GRADE C=PASS));
IF prxmatch("m/this|sample|School|class/oi",CLASS) or missing(CLASS) then delete;
run;

 

 

 

Thanks,
Suryakiran
Frequent Contributor
Posts: 81

Re: Import excel file and read specific column header

Posted in reply to SuryaKiran
Thank you so much
Esteemed Advisor
Posts: 5,535

Re: Import excel file and read specific column header

The fact that the data resides in an Excel spreadsheet is not important. You just need to read the three columns, something like

 

infile xl Excel "path\myFile.xlsx" header=no;

 

data test;

set xl.'Sheet1$A1:C1000';

 

columns will be named F1, F2 and F3.

 

then set a flag for when the subtables start

 

if F1="Class" and F2 = "Grade" and F3 = "Pass" then inTable = 1;

else do;

 

reset the flag when you read beyond the subtable

 

if inTable then if notdigit(trim(F1)) > 0 then inTable = 0;

 

Transfer the values if Inside the subtable

 

if inTable then do;

...

output;

end;

...

end;

 

 

 

 

PG
Frequent Contributor
Posts: 81

Re: Import excel file and read specific column header

Thank you so much this worked too
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 258 views
  • 5 likes
  • 5 in conversation