DATA Step, Macro, Functions and more

Need SAS data step logic

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Need SAS data step logic

Hi All,

 

I have data as below and need to find out which CUSTOMER has given a "5" score and then immediately followed with a missing score. can you please suggest how to achive this in Data step.

cSCORECOMMENT
A4I like the food
A2The service is very bad
A3The is no parking
A5The food is expensive
B2The food is cold
B5It tastes good
B. 
B3I like the drink
C4The dessert is tasty
C2I don't like the services
C3It is too expensive

 

Thanks in advance for your help!

 

 


Accepted Solutions
Solution
‎01-25-2017 09:20 PM
Respected Advisor
Posts: 3,902

Re: Need SAS data step logic

One way to go.

data have;
  infile datalines truncover dlm='|';
  input customer :$8.  SCORE :best32.  COMMENT $40.;
  datalines;
A|4|I like the food
A|2|The service is very bad
A|3|The is no parking
A|5|The food is expensive
B|2|The food is cold
B|5|It tastes good
B|.| 
B|3|I like the drink
C|4|The dessert is tasty
C|2|I don't like the services
C|3|It is too expensive
;
run;

data want;
  set have;
  if lag(customer)=customer and lag(score)=5 and missing(score) then output;
run;

View solution in original post


All Replies
Solution
‎01-25-2017 09:20 PM
Respected Advisor
Posts: 3,902

Re: Need SAS data step logic

One way to go.

data have;
  infile datalines truncover dlm='|';
  input customer :$8.  SCORE :best32.  COMMENT $40.;
  datalines;
A|4|I like the food
A|2|The service is very bad
A|3|The is no parking
A|5|The food is expensive
B|2|The food is cold
B|5|It tastes good
B|.| 
B|3|I like the drink
C|4|The dessert is tasty
C|2|I don't like the services
C|3|It is too expensive
;
run;

data want;
  set have;
  if lag(customer)=customer and lag(score)=5 and missing(score) then output;
run;
New Contributor
Posts: 4

Re: Need SAS data step logic

Hi Pat,

 

Thanks a lot for your help!.

New Contributor
Posts: 4

Re: Need SAS data step logic

Hi Pat,

 

Also can you please help how to find out which CUSTOMER has comment on "parking" AND "expensive" on same dataset.

Respected Advisor
Posts: 3,902

Re: Need SAS data step logic

Here you go.

data have;
  infile datalines truncover dlm='|';
  input customer :$8.  SCORE :best32.  COMMENT $40.;
  datalines;
A|4|I like the food
A|2|The service is very bad
A|5|The is no parking
A|.|The food is expensive
B|3|The food is cold
B|5|It tastes good
B|.| 
B|3|I like the drink
C|4|The dessert is tasty
C|2|I don't like the services
C|3|It is too expensive
;
run;

data want;
  set have;
  by customer notsorted;
  _lag_score=lag(score);
  retain score_flg expensive_flg parking_flg;
  if not first.customer and _lag_score=5 and missing(score) then score_flg=1;
  if findw(comment,'expensive',' ','i') then expensive_flg=1;
  if findw(comment,'parking',' ','i') then parking_flg=1;
  if last.customer then
    do;
      output;
      call missing(score_flg, expensive_flg, parking_flg);
    end;
run;

For the future: Try to have a single question per thread. Start a new thread with new or additional questions; reference an existing thread if the question is related to it.

Super Contributor
Posts: 251

Re: Need SAS data step logic

[ Edited ]

I'm not a big fan of the lag function, but this is a situation where it works well.

 

data source;
infile cards dsd dlm='09'x;   /* Separated by tabs, as in the posted data */
attrib c length=$ 1;
attrib score length=3;
attrib comment length=$ 30;
input c
      score
      comment;
cards;
A	4	I like the food
A	2	The service is very bad
A	3	The is no parking
A	5	The food is expensive
B	2	The food is cold
B	5	It tastes good
B	.	 
B	3	I like the drink
C	4	The dessert is tasty
C	2	I don't like the services
C	3	It is too expensive
;
run;

data score_plus_null;
set source;
if c = lag(c)                 /* If it's still the same customer */
 & missing(score)             /* if the current score is missing */
 & lag(score) = 5;            /* if the previous score is 5 */
run;
Super Contributor
Posts: 251

Re: Need SAS data step logic

Here's one way of doing it.

 

data target;
set source;
by c;
if first.c then
   count = 0;
if indexw(comment, 'expensive')
 | indexw(comment, 'parking') then
   count + 1;
if last.c;
if count = 2;
keep c;
run;
New Contributor
Posts: 4

Re: Need SAS data step logic

Hi Laurie,

It works. thanks a lot for your shift response.

Respected Advisor
Posts: 3,902

Re: Need SAS data step logic

@LaurieF

What happens if expensive and parking are in the same comment, or if there are two comments with parking but no expensive?

Super Contributor
Posts: 251

Re: Need SAS data step logic

[ Edited ]

There's really no way of getting around that with (apparently) free-form text fields: it would require natural language parsing. If the responses however were from a finite unmodifiable list, then it would be quite straightforward but we'd need to know what the finite list contained. I always prefer to cope with codes, rather than text strings, for such a problem. But we don't always have that luxury, of course.

 

Your first problem: it'd work unmodified, but the first paragraph is important; the second would be a case of adding up the strings individually. Again, refer to first paragraph!

 

My first rule of data analysis: know thy data…

Respected Advisor
Posts: 3,902

Re: Need SAS data step logic

Requirement

which CUSTOMER has comment on "parking" AND "expensive" on same dataset.

 

@LaurieF

What I've tried to point out to you is, that the code you've posted doesn't meet the requirements as posted by the OP. 

You're just counting per record if there is at least one occurrence of a word and though you would return a "requirement met" if there are only two occurrences of word parking. On the other hand you would miss the cases where parking and expensive both exists in a single record (your "count" would be 1 instead of 2).

Super Contributor
Posts: 251

Re: Need SAS data step logic

Yes. I almost completely agree. But this brings up one of the main problem with the puzzles we are set here, and one of the main problems I have in my career (touched on by "Know Thy Data"). Without a solid amount of data to test problems on, it's really easy to make assumptions.

 

In a perfect world, with lots of data, I would check the cardinality of the text field to check whether it was free-form text or not. If I could confirm that it was the case (very high cardinality), I would tailor my code to fit. With a small set of data, I made an assumption based on my understanding of the spec, and I would argue that I was correct, even if I wasn't right <grin>.

 

But that doesn't matter. I have (mostly silently) disagreed with some of the answers that people have given in the past, but the question setter has been happy so it is of no import. I think it is less likely that a single slanted interpretation of a specification is going to be accepted if there are extensive source data from which to formulate a solution. I blanch when I'm given test data which has been put together by a business analyst which may only have a hundred or a thousand observations in it. If it's a banking system, give me all the customers, all the accounts and all their transactions. It'll take me a wee bit longer to analyse it, but I will eventually come up with a solution which is more likely to last.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 262 views
  • 1 like
  • 3 in conversation