BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AZIQ1
Quartz | Level 8

Hi,

I am a beginner to intermediate SAS user and need help.

 

Here is a sample excel file that I want to import in SAS:

This is a sample file  
   
This is a sample file  
   
This is a sample file  
This is a sample file  
School A  
ClassGradePass
1AYes
2BYes
3ENo
This is a sample file  
   
This is a sample file  
   
This is a sample file  
This is a sample file  
School B  
ClassGradePass
4AYes
5ENo
6ENo

 

I want SAS to read the data every time it hits the column header "Class" so the final data looks like a clean SAS data set with single column header and all the data and delete the rows in between:

 

ClassGradePass
1AYes
2BYes
3ENo
4AYes
5ENo
6ENo

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@AZIQ1

I'm giving basically the same answer as @PGStats but also posting some code creating data as it would look like after a proc import.

 

/*proc import datafile="c:\temp\sample.xlsx"*/
/*  dbms=xlsx*/
/*  out=work.source replace;*/
/*  sheet='sheet1';*/
/*  getnames=no;*/
/*run;*/

DATA WORK.source;
    INFILE DATALINES4
        DLM='7F'x
        truncover
        DSD ;
    INPUT
        a               : $30.
        b               : $30.
        c               : $30. ;
DATALINES4;
This is a sample file   
   
This is a sample file   
   
This is a sample file   
This is a sample file   
School A   
ClassGradePass 
1AYes 
2BYes 
3ENo 
This is a sample file   
   
This is a sample file   
   
This is a sample file   
This is a sample file   
School B   
ClassGradePass
4AYes
5ENo
6ENo
;;;;
run;

data want;
  set source;
  retain headerFlg 0;
  if find(a,'This is a sample file','ti') then headerFlg=0;
  if headerFlg=1 then
    do;
      ClassID=input(a,best32.);
      Grade=input(b,$1.);
      Pass=input(c,$3.);
      output;
      keep ClassID Grade Pass;
    end;

  if find(a,'class','ti') then headerFlg=1;

run;

 

N.B: For your real data you will need to find text tokens where you're 100% sure that they are sufficient as indicators where your wanted data starts and where it ends.

You probably need also to implement some logic which verifies on which column your data starts (unless you can be certain it's always on column A).

View solution in original post

10 REPLIES 10
Reeza
Super User

Is your file Excel or CSV?

Are you doing this multiple times or once for a single file?

What have you tried so far that doesn't work, so we don't suggest things you've already tried.

AZIQ1
Quartz | Level 8
Thank you for your response.
It is in excel
It is for multiple files that are similar
I haven't tried anything, I know range specification but that wont work here since the row can be any number.
Reeza
Super User

So there's no consistency to the data at all? They're not using the same ranges across the tables?

 

If that's the case, I would consider converting it to CSV and then using a manual data step to parse the code into what you need. Not particularly fun but it gets the job done.

AZIQ1
Quartz | Level 8
Thank you, but there are hundreds of files.
Reeza
Super User

@AZIQ1 wrote:
Thank you, but there are hundreds of files.

Ok, isn't that why your using a programming language so you can automate/generalize your process?

 

There's a VBS script I posted here earlier this week that will convert all XLSX files in a folder into CSV

Then write the program once for a single file, test it. Once it's working, its trivial to run it for all by either modifying the program. 

 

We can't provide code for a problem where no data has been provided except a general idea for a single file. 

 

 

 

Patrick
Opal | Level 21

@AZIQ1

I'm giving basically the same answer as @PGStats but also posting some code creating data as it would look like after a proc import.

 

/*proc import datafile="c:\temp\sample.xlsx"*/
/*  dbms=xlsx*/
/*  out=work.source replace;*/
/*  sheet='sheet1';*/
/*  getnames=no;*/
/*run;*/

DATA WORK.source;
    INFILE DATALINES4
        DLM='7F'x
        truncover
        DSD ;
    INPUT
        a               : $30.
        b               : $30.
        c               : $30. ;
DATALINES4;
This is a sample file   
   
This is a sample file   
   
This is a sample file   
This is a sample file   
School A   
ClassGradePass 
1AYes 
2BYes 
3ENo 
This is a sample file   
   
This is a sample file   
   
This is a sample file   
This is a sample file   
School B   
ClassGradePass
4AYes
5ENo
6ENo
;;;;
run;

data want;
  set source;
  retain headerFlg 0;
  if find(a,'This is a sample file','ti') then headerFlg=0;
  if headerFlg=1 then
    do;
      ClassID=input(a,best32.);
      Grade=input(b,$1.);
      Pass=input(c,$3.);
      output;
      keep ClassID Grade Pass;
    end;

  if find(a,'class','ti') then headerFlg=1;

run;

 

N.B: For your real data you will need to find text tokens where you're 100% sure that they are sufficient as indicators where your wanted data starts and where it ends.

You probably need also to implement some logic which verifies on which column your data starts (unless you can be certain it's always on column A).

SuryaKiran
Meteorite | Level 14

Hi,

 

If this is your situation the best way is convert the file to CSV as others mentioned and then using DATA STEP you can read as you need. 

DATA WORK.SAS_Import;
INFILE '/SAS_Import.csv' dlm=',' dsd ;
    INPUT @;
	IF prxmatch("m/this|sample|School|class| /oi",_INFILE_) or missing(compress(_infile_,',')) then delete;
	else INPUT CLASS $ GRADE $ PASS $;
RUN;

There is another way, for this you need to understand how your data is imported and then convert the dataset (Cleansing data).

 

proc import datafile='/SAS_Import.xlsx'
dbms=xlsx out=Have replace;
Getnames=no;
run;

data want;
set have(rename=(A=CLASS B=GRADE C=PASS));
IF prxmatch("m/this|sample|School|class/oi",CLASS) or missing(CLASS) then delete;
run;

 

 

 

Thanks,
Suryakiran
AZIQ1
Quartz | Level 8
Thank you so much
PGStats
Opal | Level 21

The fact that the data resides in an Excel spreadsheet is not important. You just need to read the three columns, something like

 

infile xl Excel "path\myFile.xlsx" header=no;

 

data test;

set xl.'Sheet1$A1:C1000';

 

columns will be named F1, F2 and F3.

 

then set a flag for when the subtables start

 

if F1="Class" and F2 = "Grade" and F3 = "Pass" then inTable = 1;

else do;

 

reset the flag when you read beyond the subtable

 

if inTable then if notdigit(trim(F1)) > 0 then inTable = 0;

 

Transfer the values if Inside the subtable

 

if inTable then do;

...

output;

end;

...

end;

 

 

 

 

PG
AZIQ1
Quartz | Level 8
Thank you so much this worked too

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 4340 views
  • 5 likes
  • 5 in conversation