Dear,
I posted this question yesterday. Some one suggested me a few ideas. But I am still not able to get output I need. Please suggest if possible. Thank you very much.
I am running the code on the excel attachment. I am getting two OBS with one with DTERM=blank value. I need to delete the variable DTERM=blank valuefrom the data. Please help
proc import datafile="C:\\Book1.xlsx" DBMS=xlsx out=DV; run; data two; set dv; n+1; do i=1 to countw(term,''); temp=scan(term,i,' ');len=length(temp)+1;output; end; run; data three; set two; by n; retain sum; if first.n then sum=0; if last.n then len=len-1; sum+len; if sum gt 200 then do;group+1;sum=len;end; run; data four; length dterm $ 200; do until(last.group); set three; by n group; dterm=catx(' ',dterm,temp); end; if dterm='' then delete; one=length(dterm); run;
@knveraraju91 You should know by now to include your data as a datas step, not an Excel file. You'll get responses faster so it's in your best interest to do so.
You don't want any observations where DTERM is blank? Why not just exclude them when you read it from Excel?
proc import datafile="C:\\Book1.xlsx" DBMS=xlsx
replace out=DV (where=(not missing(dterm)))
;
run;
@Tom it's a calculated value, not in original file.
Subseting and excel table before converting to a SAS dataset
Removing rows where name is blank
inspired by
https://goo.gl/VkLWHa
https://communities.sas.com/t5/Base-SAS-Programming/How-to-exclude-a-blank-OBS-that-appears-when-extracting-a-data/m-p/342275
HAVE
====
Up to 40 obs from xel.class total obs=10
Obs DTERM SEX AGE HEIGHT WEIGHT
1 Alfred M 14 69.0 112.5
2 Alice F 13 56.5 84.0
3 Barbara F 13 65.3 98.0
4 F 14 62.8 102.5 * remove this row;
5 Henry M 14 63.5 102.5
6 James M 12 57.3 83.0
7 Jane F 12 59.8 84.5
8 Janet F 15 62.5 112.5
9 Jeffrey M 13 62.5 84.0
10 John M 12 59.0 99.5
WANT
====
Up to 40 obs WORK.DV total obs=9
Obs DTERM SEX AGE HEIGHT WEIGHT
1 Alfred M 14 69.0 112.5
2 Alice F 13 56.5 84.0
3 Barbara F 13 65.3 98.0
4 Henry M 14 63.5 102.5
5 James M 12 57.3 83.0
6 Jane F 12 59.8 84.5
7 Janet F 15 62.5 112.5
8 Jeffrey M 13 62.5 84.0
9 John M 12 59.0 99.5
FULL SOLUTION
=============
* create some data;
%utlfkil(d:/xls/class.xlsx); * delete if exists;
libname xel "d:/xls/class.xlsx";
data xel.class(rename=name=dterm);
set sashelp.class(obs=10);
if name='Carol' then do;
name='';
end;
run;quit;
libname xel clear;
proc sql dquote=ansi;
connect to excel (Path="d:/xls/class.xlsx");
create
table dv as
select
*
from connection to Excel
(
Select
*
from
class
where
dterm <> ' '
);
disconnect from Excel;
Quit;
If you don't have a named range the use [sheet1$].
1222 proc sql dquote=ansi;
1223 connect to excel (Path="d:/xls/class.xlsx");
1224 create
1225 table dv as
1226 select
1227 *
1228 from connection to Excel
1229 (
1230 Select
1231 *
1232 from
1233 class
1234 where
1235 dterm <> ' '
1236 );
NOTE: Table WORK."DV" created, with 9 rows and 5 columns.
1237 disconnect from Excel;
1238 Quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
Thank you for the support. There are no blank value in excel file. Some of the values in excel file are longer than 200 characters. So, I am running my code to split the values into multiple variables. For this value I am getting two obs after running my code, one obs containing the value from excel file and other with blank .The blank obs which I am not able to delete from sas dataset. Any suggestions.
A lot of people in this forum don't want to download Excel files for security reasons.
Here a data step creating the data you would get when using Proc Import on your sample Excel.
/* import data */
/*proc import */
/* datafile="C:\\Book1.xlsx" */
/* DBMS=xlsx */
/* replace*/
/* out=DV*/
/* ;*/
/* sheet='Sheet1';*/
/* getnames=yes;*/
/*run;*/
/* create data as created by Proc Import step */
data dv;
infile datalines truncover;
input term $1000.;
datalines;
Subject 00000000 baseline labs were collected on Day -1, 00SEP0000, instead of on Day -0 per protocol. The baseline labs should not have been drawn as the baseline visit was < 00 days from screening.
;
run;
Some of the values in excel file are longer than 200 characters.
Why is that relevant for you. A SAS character variable can have a length of up to 32767 characters (32KB).
So, I am running my code to split the values into multiple variables
Looking at the code you've posted, it appears that you're rather trying to split up the string into words creating an new observation per word (and not a single observation with multiple variables). If that's what you're after then tweaking the code would be quite simple.
Question is: What are you actually trying to achieve? May be if you explain us your end goal we can give you some better advice how you need to massage your source data.
I still do not understand the question, but perhaps if I try to restate the information you have posted we can figure out what the question is?
You posted an EXCEL file, but when I browsed it there is just one value. So here it is in a more usable form.
data dv ;
term = 'Subject 00000000 baseline labs were collected on Day -1,'
|| ' 00SEP0000, instead of on Day -0 per protocol. The baseline'
|| ' labs should not have been drawn as the baseline visit was'
|| ' < 00 days from screening.'
;
run;
It this point it would have been helpful to post what you expected the answer to be. Say a new data step that creates the expected output dataset.
So looking at the code you posted I see three steps. (1) split TERM into words using space as the delimiter. (Note this would eliminate any extra spaces that might have been between the words.) (2) Add up how many characters the words would take. It seems to goal was to force the combined length to be less than 200 characters. (3) Put the words back together into longer strings.
So now I am getting the feeling that the question is
How can I split a long string into shorter strings of 200 characters or less?
But since the string that was in the Excel file was only 199 characters I am not even sure if I have the right question because there is nothing to do to shorten it to strings less than 200 characters long. So let's remake the problem data to have something longer than 200.
data dv ;
term = 'Subject 00000000 baseline labs were collected on Day -1,'
|| ' 00SEP0000, instead of on Day -0 per protocol. The baseline'
|| ' labs should not have been drawn as the baseline visit was'
|| ' < 00 days from screening.'
|| ' And more so that it is longer than 200 characters.'
;
len = lengthn(term);
put len=;
run;
Now let's split it into two string. The last sentence should be in the second string.
data want ;
set dv ;
length dterm $200 word $200 ;
group=1;
do i=1 by 1 until (word=' ');
word = scan(term,i,' ');
if length(catx(' ',dterm,word)) > 200 then do;
output;
group+1;
dterm=word;
end;
else dterm=catx(' ',dterm,word);
end;
if dterm ne ' ' or group=1 then output;
run;
Most likely the issue in your original program stems from always adding 1 to the length of the word. You should only add the one when you are appending it to an existing word since you then need to account for the space that you are adding. For example the first two words have length 6 and 8. Together they have length 15 since there needs to be a space between them, but that does not mean the first word has length 7 or that the two combined have length 16.
Thank you all for your valuable time. The issue is that I am having blank OBS which I should be able to delete with simple code such as "if variable ='' then delete".
But I am not able to delete the blank OBS. How to delete blank obs with length =2. The length assigned because of my code.
The blank value is the result of my code used for importing data from an excel file and split the values into multiple variables.
The some of the values in the file are longer than 200.
1. Compress(), make sure not printable characters are included.
2. Missing()
A character variable that has a length of 2 has at least one non-blank (hex 20) character in it.
@knveraraju91 wrote:
Thank you all for your valuable time. The issue is that I am having blank OBS which I should be able to delete with simple code such as "if variable ='' then delete".
But I am not able to delete the blank OBS. How to delete blank obs with length =2. The length assigned because of my code.
The blank value is the result of my code used for importing data from an excel file and split the values into multiple variables.
The some of the values in the file are longer than 200.
It doesn't help if you simply repeat your feedback.
What would help:
- Provide what you have (a SAS data step creating sample data; you can use the one I've posted earlier if that reflects your "have")
- Show us what you need, the desired result (ideally also as a SAS data step creating such a result table)
- Post the code you've already developed and explain where you get stuck (this bit you've done already)
A string with length of 2 has a non-blank character in the second position. Could be a CR ('0D'x) or LF ('0A'x) or perhaps one of those strange blanks that Microsoft treats as a non-breaking space ('A0'x).
You can use the $HEX format to see the value.
put dterm $hex4. ;
Then you could replace them with spaces at the beginning of your program.
term = translate(term,' ','0D0AA0'x);
Thank you very much. This works .
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.