- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I've done quite a bit of searching online around this and can't find anything. I'm using Enterprise Guide version 7.15 on a Windows PC.
I'm trying to create a flag variable in a proc sql program to identify whether a specific character variable contains the word 'unsold'. Here is my code: -
proc sql;
create table work.test as
select
*
,case when findw(stop_category_description,'unsold','','i') then 1 else 0 end as unsold
from work.t1;
quit;
When I check the output dataset, there are no observations populated with 1 for the unsold variable. If however I view the output dataset in EG and click 'where' and paste the 'findw(stop_category_description,'unsold','','i')' code in, it identifies 31 observations, all of which I can see do contain the work 'unsold' in them but that are populated with a 0 for the unsold variable.
I did see a previous post from a couple of years ago that someone had put about the findw function and the user was recommended to contact SAS Technical Services. I have a feeling this may also need their attention but thought I'd ask the community first in case there's something I'm missing.
As always, any help will be gratefully received.
Thanks,
Rob
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your problem comes from the third parameter in findw. '' , with no space, means there are NO characters in the list. If you want default delimiters don't put the string at all, if you want a blank as the only delimiter you must provide a blank in the parameter.
Otherwise NO characters are used for delimiters so your whole string is the only word.
Here is a minimum sort of test case:
data example; infile datalines missover; input var $25.; datalines; contains unsold does not have word ; proc sql; create table work.test as select * ,case when findw(var,'unsold',' ','i') then 1 else 0 end as unsold from work.example; quit;
that does set the flag.
Also, you need to remember that this forum reformats the text in the main message window and so code pasted there may not actually be what was submitted when white space characters are removed or the other reformatting applied.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please provide a small portion (that illustrates the problem) of the data set used (WORK.T1) following these instructions (and not via any other method)
https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm going to struggle to do that I'm afraid as the dataset contains sensitive customer data (apologies, I had meant to include that information in my original post).
I had hoped that I had provided enough information to demonstrate that the function worked within the where facility in the EG viewer but not in the code itself. I've also just tried doing the same in the source dataset (work.t1) and it has correctly identified the correct observation there as well.
Not to worry, I can easily work around it for the time being anyway, I just thought it might be something that someone had come across before. If I can think of a way to provide the data in the required format, I will do.
Thanks anyway
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just to add, I've also coded this up in a data step: -
data work.t3;
set work.t1;
unsold = 0;
if findw(stop_category_description,'unsold','','i') then unsold = 1;
run;
and this has worked perfectly so it seems to be something to do with how proc sql is implementing the function.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No need to supply the real data. Just some example data that exhibits the issue.
For example you could subset to just the variables needed for the FINDW() call and one or two observations where it fails (and if there are any then one or two where it works). If the values of that variable are sensitive then change some of the values to make them sharable.
You might find that you can can find the solution yourself in the process of creating that test data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't think you can get much help on this question without data that illustrates the problem. Why not make some synthetic data that shows the issue?
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the suggestions. I've not been able to download the data2datastep zip file as my work pc will not allow it but have looked to recreate the issue using some datalines code (hope that's ok): -
data l1;
input stop_category_description $30.;
datalines;
Covid Customer Care
Post Covid Arrangement Agreed
Covid Customer Care
New VT Unsold
Covid Customer Care
Covid Customer Care
Post Covid
Covid Customer Care
Post Covid Arrangement Agreed
Post Covid
Covid Customer Care
Covid Customer Care
New Repossession Unsold
New Repossession Unsold
New VT Unsold
Post Covid Arrangement Agreed
Covid Customer Care
Post Covid
Covid Customer Care
Covid Customer Care
Covid Customer Care
Covid Customer Care
Covid Customer Care
Covid Customer Care
Covid Customer Care
Covid Customer Care
Post Covid Arrangement Agreed
New VT Unsold
Post Covid Arrangement Agreed
New IVA
;
run;
proc sql;
create table l2 as
select
stop_category_description
,case when findw(stop_category_description,'unsold','','i') then 1 else 0 end as unsold
from l1;
quit;
data l3;
set l1;
if findw(stop_category_description,'unsold','','i') then unsold = 1;
else unsold = 0;
run;
when I run these three programs, the proc sql one does not populate the 'unsold' variable with any 1 values whereas when I do the same in the data step, it correctly identifies them.
As I say, I can work around it anyway but I like to understand why things don't work as for the most part, it's because I'm doing something incorrectly and want to try to learn the right way of doing things but in this case, I'm not sure that is what's happening.
Thanks again
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your problem comes from the third parameter in findw. '' , with no space, means there are NO characters in the list. If you want default delimiters don't put the string at all, if you want a blank as the only delimiter you must provide a blank in the parameter.
Otherwise NO characters are used for delimiters so your whole string is the only word.
Here is a minimum sort of test case:
data example; infile datalines missover; input var $25.; datalines; contains unsold does not have word ; proc sql; create table work.test as select * ,case when findw(var,'unsold',' ','i') then 1 else 0 end as unsold from work.example; quit;
that does set the flag.
Also, you need to remember that this forum reformats the text in the main message window and so code pasted there may not actually be what was submitted when white space characters are removed or the other reformatting applied.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's brilliant, thank you.
I was only including the third parameter as my understanding is if you are applying a modifier, you also have to apply a delimiter parameter so that SAS knows you are intending to use the 'i' as a modifier and not a delimiter, however I hadn't appreciated that I would also need to include a space.
I should have known it was me doing the wrong thing (again) but that's been really helpful.
Thanks ballardw and thanks to everyone for looking.
Rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Having said that, I'm still a bit confused as to why I don't need a space when doing it in a data step but I do in proc sql but I'll worry about that another time!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There is a difference between data step and SQL that are impacting this.
In a data step there is no way to express a literal string of length zero. If you code two quote character next to each other the result is a string of length 1 with a single space character. You can generate one using the TRIMN() function.
Test:
data example;
input var $25.;
datastep1=findw(var,'unsold',' ','i') ;
datastep2=findw(var,'unsold','','i') ;
datastep3=findw(var,'unsold',trimn(' '),'i') ;
datalines;
unsold
contains unsold
does not have word
;
proc sql;
create table work.test as
select *
, findw(var,'unsold',' ','i') as sql1
, findw(var,'unsold','','i') as sql2
, findw(var,'unsold',trimn(' '),'i') as sql3
from work.example
;
quit;
Results:
Obs var datastep1 datastep2 datastep3 sql1 sql2 sql3 1 unsold 1 1 0 1 0 0 2 contains unsold 10 10 0 10 0 0 3 does not have word 0 0 0 0 0 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Interestingly, with PROC SQL's FEEDBACK option the log incorrectly states that a single blank was interpreted as a null string, whereas the log of a DATA step correctly describes the opposite interpretation of a null string in a WHERE condition:
648 proc sql feedback; 649 create table selsql as 650 select var from example 651 where findw(var,'unsold',' ','i'); NOTE: Statement transforms to: select EXAMPLE.var from WORK.EXAMPLE where FINDW(EXAMPLE.var, 'unsold', '', 'i'); NOTE: Table WORK.SELSQL created, with 2 rows and 1 columns. 652 quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.13 seconds cpu time 0.12 seconds 653 654 data selds; 655 set example; 656 where findw(var,'unsold','','i'); 657 run; NOTE: There were 2 observations read from the data set WORK.EXAMPLE. WHERE FINDW(var, 'unsold', ' ', 'i'); NOTE: The data set WORK.SELDS has 2 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.04 seconds