Hi,
I am a beginner to intermediate SAS user and need help.
Here is a sample excel file that I want to import in SAS:
This is a sample file | ||
This is a sample file | ||
This is a sample file | ||
This is a sample file | ||
School A | ||
Class | Grade | Pass |
1 | A | Yes |
2 | B | Yes |
3 | E | No |
This is a sample file | ||
This is a sample file | ||
This is a sample file | ||
This is a sample file | ||
School B | ||
Class | Grade | Pass |
4 | A | Yes |
5 | E | No |
6 | E | No |
I want SAS to read the data every time it hits the column header "Class" so the final data looks like a clean SAS data set with single column header and all the data and delete the rows in between:
Class | Grade | Pass |
1 | A | Yes |
2 | B | Yes |
3 | E | No |
4 | A | Yes |
5 | E | No |
6 | E | No |
Thank you
I'm giving basically the same answer as @PGStats but also posting some code creating data as it would look like after a proc import.
/*proc import datafile="c:\temp\sample.xlsx"*/
/* dbms=xlsx*/
/* out=work.source replace;*/
/* sheet='sheet1';*/
/* getnames=no;*/
/*run;*/
DATA WORK.source;
INFILE DATALINES4
DLM='7F'x
truncover
DSD ;
INPUT
a : $30.
b : $30.
c : $30. ;
DATALINES4;
This is a sample file
This is a sample file
This is a sample file
This is a sample file
School A
ClassGradePass
1AYes
2BYes
3ENo
This is a sample file
This is a sample file
This is a sample file
This is a sample file
School B
ClassGradePass
4AYes
5ENo
6ENo
;;;;
run;
data want;
set source;
retain headerFlg 0;
if find(a,'This is a sample file','ti') then headerFlg=0;
if headerFlg=1 then
do;
ClassID=input(a,best32.);
Grade=input(b,$1.);
Pass=input(c,$3.);
output;
keep ClassID Grade Pass;
end;
if find(a,'class','ti') then headerFlg=1;
run;
N.B: For your real data you will need to find text tokens where you're 100% sure that they are sufficient as indicators where your wanted data starts and where it ends.
You probably need also to implement some logic which verifies on which column your data starts (unless you can be certain it's always on column A).
Is your file Excel or CSV?
Are you doing this multiple times or once for a single file?
What have you tried so far that doesn't work, so we don't suggest things you've already tried.
So there's no consistency to the data at all? They're not using the same ranges across the tables?
If that's the case, I would consider converting it to CSV and then using a manual data step to parse the code into what you need. Not particularly fun but it gets the job done.
@AZIQ1 wrote:
Thank you, but there are hundreds of files.
Ok, isn't that why your using a programming language so you can automate/generalize your process?
There's a VBS script I posted here earlier this week that will convert all XLSX files in a folder into CSV
Then write the program once for a single file, test it. Once it's working, its trivial to run it for all by either modifying the program.
We can't provide code for a problem where no data has been provided except a general idea for a single file.
I'm giving basically the same answer as @PGStats but also posting some code creating data as it would look like after a proc import.
/*proc import datafile="c:\temp\sample.xlsx"*/
/* dbms=xlsx*/
/* out=work.source replace;*/
/* sheet='sheet1';*/
/* getnames=no;*/
/*run;*/
DATA WORK.source;
INFILE DATALINES4
DLM='7F'x
truncover
DSD ;
INPUT
a : $30.
b : $30.
c : $30. ;
DATALINES4;
This is a sample file
This is a sample file
This is a sample file
This is a sample file
School A
ClassGradePass
1AYes
2BYes
3ENo
This is a sample file
This is a sample file
This is a sample file
This is a sample file
School B
ClassGradePass
4AYes
5ENo
6ENo
;;;;
run;
data want;
set source;
retain headerFlg 0;
if find(a,'This is a sample file','ti') then headerFlg=0;
if headerFlg=1 then
do;
ClassID=input(a,best32.);
Grade=input(b,$1.);
Pass=input(c,$3.);
output;
keep ClassID Grade Pass;
end;
if find(a,'class','ti') then headerFlg=1;
run;
N.B: For your real data you will need to find text tokens where you're 100% sure that they are sufficient as indicators where your wanted data starts and where it ends.
You probably need also to implement some logic which verifies on which column your data starts (unless you can be certain it's always on column A).
Hi,
If this is your situation the best way is convert the file to CSV as others mentioned and then using DATA STEP you can read as you need.
DATA WORK.SAS_Import;
INFILE '/SAS_Import.csv' dlm=',' dsd ;
INPUT @;
IF prxmatch("m/this|sample|School|class| /oi",_INFILE_) or missing(compress(_infile_,',')) then delete;
else INPUT CLASS $ GRADE $ PASS $;
RUN;
There is another way, for this you need to understand how your data is imported and then convert the dataset (Cleansing data).
proc import datafile='/SAS_Import.xlsx'
dbms=xlsx out=Have replace;
Getnames=no;
run;
data want;
set have(rename=(A=CLASS B=GRADE C=PASS));
IF prxmatch("m/this|sample|School|class/oi",CLASS) or missing(CLASS) then delete;
run;
The fact that the data resides in an Excel spreadsheet is not important. You just need to read the three columns, something like
infile xl Excel "path\myFile.xlsx" header=no;
data test;
set xl.'Sheet1$A1:C1000';
columns will be named F1, F2 and F3.
then set a flag for when the subtables start
if F1="Class" and F2 = "Grade" and F3 = "Pass" then inTable = 1;
else do;
reset the flag when you read beyond the subtable
if inTable then if notdigit(trim(F1)) > 0 then inTable = 0;
Transfer the values if Inside the subtable
if inTable then do;
...
output;
end;
...
end;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.