BookmarkSubscribeRSS Feed
knveraraju91
Barite | Level 11

Dear,

 

I posted this question yesterday. Some one suggested me a few ideas. But I am still not able to get output I need. Please suggest if possible. Thank you very much.

 

I am running the code on the excel attachment. I am getting two OBS with one with DTERM=blank value.  I need to delete the variable DTERM=blank  valuefrom the data. Please help

proc import datafile="C:\\Book1.xlsx" DBMS=xlsx out=DV;
                
run;
data two;
set dv;
	n+1;
 do i=1 to countw(term,'');
  temp=scan(term,i,' ');len=length(temp)+1;output;
 end;
	run;

	data three;
 set two;
 by n;
 retain sum;
 if first.n then sum=0;
 if last.n then len=len-1;
 sum+len;
 if sum gt 200 then do;group+1;sum=len;end;
run;

data four;
length dterm $ 200;
do until(last.group);
 set three;
 by n group;
 dterm=catx(' ',dterm,temp);
end;
if dterm='' then delete;
one=length(dterm);
run; 

 

13 REPLIES 13
Reeza
Super User

@knveraraju91 You should know by now to include your data as a datas step, not an Excel file. You'll get responses faster so it's in your best interest to do so.

Tom
Super User Tom
Super User

You don't want any observations where DTERM is blank? Why not just exclude them when you read it from Excel?

proc import datafile="C:\\Book1.xlsx" DBMS=xlsx
  replace  out=DV (where=(not missing(dterm)))
;
run;
Reeza
Super User

@Tom it's a calculated value, not in original file. 

rogerjdeangelis
Barite | Level 11
Subseting and excel table before converting to a SAS dataset

Removing rows where name is blank

inspired by
https://goo.gl/VkLWHa
https://communities.sas.com/t5/Base-SAS-Programming/How-to-exclude-a-blank-OBS-that-appears-when-extracting-a-data/m-p/342275

HAVE
====

Up to 40 obs from xel.class total obs=10

     Obs    DTERM      SEX    AGE    HEIGHT    WEIGHT

       1    Alfred      M      14     69.0      112.5
       2    Alice       F      13     56.5       84.0
       3    Barbara     F      13     65.3       98.0
       4                F      14     62.8      102.5  * remove this row;
       5    Henry       M      14     63.5      102.5
       6    James       M      12     57.3       83.0
       7    Jane        F      12     59.8       84.5
       8    Janet       F      15     62.5      112.5
       9    Jeffrey     M      13     62.5       84.0
      10    John        M      12     59.0       99.5

WANT
====

Up to 40 obs WORK.DV total obs=9

Obs    DTERM      SEX    AGE    HEIGHT    WEIGHT

 1     Alfred      M      14     69.0      112.5
 2     Alice       F      13     56.5       84.0
 3     Barbara     F      13     65.3       98.0
 4     Henry       M      14     63.5      102.5
 5     James       M      12     57.3       83.0
 6     Jane        F      12     59.8       84.5
 7     Janet       F      15     62.5      112.5
 8     Jeffrey     M      13     62.5       84.0
 9     John        M      12     59.0       99.5

FULL SOLUTION
=============

* create some data;
%utlfkil(d:/xls/class.xlsx);  * delete if exists;
libname xel "d:/xls/class.xlsx";

data xel.class(rename=name=dterm);
  set sashelp.class(obs=10);
  if name='Carol' then do;
     name='';
  end;
run;quit;

libname xel clear;


proc sql dquote=ansi;
 connect to excel (Path="d:/xls/class.xlsx");
    create
        table dv as
    select
         *
        from connection to Excel
        (
         Select
            *
         from
           class
         where
           dterm <> ' '
        );
    disconnect from Excel;
Quit;

If you don't have a named range the use [sheet1$].

1222  proc sql dquote=ansi;
1223   connect to excel (Path="d:/xls/class.xlsx");

1224      create
1225          table dv as
1226      select
1227           *
1228          from connection to Excel
1229          (
1230           Select
1231              *
1232           from
1233             class
1234           where
1235             dterm <> ' '
1236          );
NOTE: Table WORK."DV" created, with 9 rows and 5 columns.

1237      disconnect from Excel;
1238  Quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.03 seconds


knveraraju91
Barite | Level 11

Thank you for the support. There are no blank value in excel file. Some of the values in excel file are longer than 200 characters. So, I am running my code to split the values into multiple  variables. For this value I am getting two obs after running my code, one obs containing the value from excel file and other with blank .The blank obs which I am not able to delete from sas dataset. Any suggestions.

Patrick
Opal | Level 21

A lot of people in this forum don't want to download Excel files for security reasons. 

 

Here a data step creating the data you would get when using Proc Import on your sample Excel.

/* import data */
/*proc import */
/* datafile="C:\\Book1.xlsx" */
/* DBMS=xlsx */
/* replace*/
/* out=DV*/
/* ;*/
/* sheet='Sheet1';*/
/* getnames=yes;*/
/*run;*/


/* create data as created by Proc Import step */
data dv;
infile datalines truncover;
input term $1000.;
datalines;
Subject 00000000 baseline labs were collected on Day -1, 00SEP0000, instead of on Day -0 per protocol. The baseline labs should not have been drawn as the baseline visit was < 00 days from screening.
;
run;

Some of the values in excel file are longer than 200 characters.

Why is that relevant for you. A SAS character variable can have a length of up to 32767 characters (32KB).

 

So, I am running my code to split the values into multiple  variables

Looking at the code you've posted, it appears that you're rather trying to split up the string into words creating an new observation per word (and not a single observation with multiple variables). If that's what you're after then tweaking the code would be quite simple.

 

Question is: What are you actually trying to achieve? May be if you explain us your end goal we can give you some better advice how you need to massage your source data.

Tom
Super User Tom
Super User

I still do not understand the question, but perhaps if I try to restate the information you have posted we can figure out what the question is?  

You posted an EXCEL file, but when I browsed it there is just one value. So here it is in a more usable form.

data dv ;
  term = 'Subject 00000000 baseline labs were collected on Day -1,'
      || ' 00SEP0000, instead of on Day -0 per protocol. The baseline'
      || ' labs should not have been drawn as the baseline visit was'
      || ' < 00 days from screening.'
  ;
run;

It this point it would have been helpful to post what you expected the answer to be.  Say a new data step that creates the expected output dataset.

 

So looking at the code you posted I see three steps. (1) split TERM into words using space as the delimiter. (Note this would eliminate any extra spaces that might have been between the words.)  (2) Add up how many characters the words would take.  It seems to goal was to force the combined length to be less than 200 characters. (3) Put the words back together into longer strings.

 

So now I am getting the feeling that the question is 

How can I split a long string into shorter strings of 200 characters or less?

But since the string that was in the Excel file was only 199 characters I am not even sure if I have the right question because there is nothing to do to shorten it to strings less than 200 characters long.  So let's remake the problem data to have something longer than 200.

data dv ;
  term = 'Subject 00000000 baseline labs were collected on Day -1,'
      || ' 00SEP0000, instead of on Day -0 per protocol. The baseline'
      || ' labs should not have been drawn as the baseline visit was'
      || ' < 00 days from screening.'
      || ' And more so that it is longer than 200 characters.'
  ;
  len = lengthn(term);
  put len=;
run;

Now let's split it into two string.  The last sentence should be in the second string.

data want ;
  set dv ;
  length dterm $200 word $200 ;
  group=1;
  do i=1 by 1 until (word=' ');
    word = scan(term,i,' ');
    if length(catx(' ',dterm,word)) > 200 then do;
      output;
      group+1;
      dterm=word;
    end;
    else dterm=catx(' ',dterm,word);
  end;
  if dterm ne ' ' or group=1 then output;
run;

Most likely the issue in your original program stems from always adding 1 to the length of the word.  You should only add the one when you are appending it to an existing word since you then need to account for the space that you are adding.  For example the first two words have length 6 and 8.  Together they have length 15 since there needs to be a space between them, but that does not mean the first word has length 7 or that the two combined have length 16.

 

 

 

 

 

knveraraju91
Barite | Level 11

Thank you all for your valuable time. The issue is that I am having blank OBS which I should be able to delete with simple code such   as   "if variable ='' then delete". 

 

But I am not able to delete the blank OBS. How to delete blank obs with length =2. The length assigned because of my code.

 

The blank value is the result of my code used for importing  data from an excel file and split the values into multiple variables.

The some of the values in the file are longer than 200.

 

 

Reeza
Super User

1. Compress(), make sure not printable characters are included. 

2. Missing()

 

Kurt_Bremser
Super User

A character variable that has a length of 2 has at least one non-blank (hex 20) character in it.


@knveraraju91 wrote:

Thank you all for your valuable time. The issue is that I am having blank OBS which I should be able to delete with simple code such   as   "if variable ='' then delete". 

 

But I am not able to delete the blank OBS. How to delete blank obs with length =2. The length assigned because of my code.

 

The blank value is the result of my code used for importing  data from an excel file and split the values into multiple variables.

The some of the values in the file are longer than 200.

 

 


 

Patrick
Opal | Level 21

@knveraraju91

It doesn't help if you simply repeat your feedback.

What would help:

- Provide what you have (a SAS data step creating sample data; you can use the one I've posted earlier if that reflects your "have") 

- Show us what you need, the desired result (ideally also as a SAS data step creating such a result table)

- Post the code you've already developed and explain where you get stuck (this bit you've done already)

 

 

Tom
Super User Tom
Super User

A string with length of 2 has a non-blank character in the second position. Could be a CR ('0D'x) or LF ('0A'x) or perhaps one of those strange blanks that Microsoft treats as a non-breaking space ('A0'x).

You can use the $HEX format to see the value.  

put dterm $hex4. ;

Then you could replace them with spaces at the beginning of your program.

term = translate(term,'   ','0D0AA0'x);
knveraraju91
Barite | Level 11

Thank you very much. This works .

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2544 views
  • 4 likes
  • 6 in conversation