BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
neelima
Calcite | Level 5

Hi All,

 

I have data as below and need to find out which CUSTOMER has given a "5" score and then immediately followed with a missing score. can you please suggest how to achive this in Data step.

cSCORECOMMENT
A4I like the food
A2The service is very bad
A3The is no parking
A5The food is expensive
B2The food is cold
B5It tastes good
B. 
B3I like the drink
C4The dessert is tasty
C2I don't like the services
C3It is too expensive

 

Thanks in advance for your help!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

One way to go.

data have;
  infile datalines truncover dlm='|';
  input customer :$8.  SCORE :best32.  COMMENT $40.;
  datalines;
A|4|I like the food
A|2|The service is very bad
A|3|The is no parking
A|5|The food is expensive
B|2|The food is cold
B|5|It tastes good
B|.| 
B|3|I like the drink
C|4|The dessert is tasty
C|2|I don't like the services
C|3|It is too expensive
;
run;

data want;
  set have;
  if lag(customer)=customer and lag(score)=5 and missing(score) then output;
run;

View solution in original post

11 REPLIES 11
Patrick
Opal | Level 21

One way to go.

data have;
  infile datalines truncover dlm='|';
  input customer :$8.  SCORE :best32.  COMMENT $40.;
  datalines;
A|4|I like the food
A|2|The service is very bad
A|3|The is no parking
A|5|The food is expensive
B|2|The food is cold
B|5|It tastes good
B|.| 
B|3|I like the drink
C|4|The dessert is tasty
C|2|I don't like the services
C|3|It is too expensive
;
run;

data want;
  set have;
  if lag(customer)=customer and lag(score)=5 and missing(score) then output;
run;
neelima
Calcite | Level 5

Hi Pat,

 

Thanks a lot for your help!.

neelima
Calcite | Level 5

Hi Pat,

 

Also can you please help how to find out which CUSTOMER has comment on "parking" AND "expensive" on same dataset.

Patrick
Opal | Level 21

Here you go.

data have;
  infile datalines truncover dlm='|';
  input customer :$8.  SCORE :best32.  COMMENT $40.;
  datalines;
A|4|I like the food
A|2|The service is very bad
A|5|The is no parking
A|.|The food is expensive
B|3|The food is cold
B|5|It tastes good
B|.| 
B|3|I like the drink
C|4|The dessert is tasty
C|2|I don't like the services
C|3|It is too expensive
;
run;

data want;
  set have;
  by customer notsorted;
  _lag_score=lag(score);
  retain score_flg expensive_flg parking_flg;
  if not first.customer and _lag_score=5 and missing(score) then score_flg=1;
  if findw(comment,'expensive',' ','i') then expensive_flg=1;
  if findw(comment,'parking',' ','i') then parking_flg=1;
  if last.customer then
    do;
      output;
      call missing(score_flg, expensive_flg, parking_flg);
    end;
run;

For the future: Try to have a single question per thread. Start a new thread with new or additional questions; reference an existing thread if the question is related to it.

LaurieF
Barite | Level 11

I'm not a big fan of the lag function, but this is a situation where it works well.

 

data source;
infile cards dsd dlm='09'x;   /* Separated by tabs, as in the posted data */
attrib c length=$ 1;
attrib score length=3;
attrib comment length=$ 30;
input c
      score
      comment;
cards;
A	4	I like the food
A	2	The service is very bad
A	3	The is no parking
A	5	The food is expensive
B	2	The food is cold
B	5	It tastes good
B	.	 
B	3	I like the drink
C	4	The dessert is tasty
C	2	I don't like the services
C	3	It is too expensive
;
run;

data score_plus_null;
set source;
if c = lag(c)                 /* If it's still the same customer */
 & missing(score)             /* if the current score is missing */
 & lag(score) = 5;            /* if the previous score is 5 */
run;
LaurieF
Barite | Level 11

Here's one way of doing it.

 

data target;
set source;
by c;
if first.c then
   count = 0;
if indexw(comment, 'expensive')
 | indexw(comment, 'parking') then
   count + 1;
if last.c;
if count = 2;
keep c;
run;
neelima
Calcite | Level 5

Hi Laurie,

It works. thanks a lot for your shift response.

Patrick
Opal | Level 21

@LaurieF

What happens if expensive and parking are in the same comment, or if there are two comments with parking but no expensive?

LaurieF
Barite | Level 11

There's really no way of getting around that with (apparently) free-form text fields: it would require natural language parsing. If the responses however were from a finite unmodifiable list, then it would be quite straightforward but we'd need to know what the finite list contained. I always prefer to cope with codes, rather than text strings, for such a problem. But we don't always have that luxury, of course.

 

Your first problem: it'd work unmodified, but the first paragraph is important; the second would be a case of adding up the strings individually. Again, refer to first paragraph!

 

My first rule of data analysis: know thy data…

Patrick
Opal | Level 21

Requirement

which CUSTOMER has comment on "parking" AND "expensive" on same dataset.

 

@LaurieF

What I've tried to point out to you is, that the code you've posted doesn't meet the requirements as posted by the OP. 

You're just counting per record if there is at least one occurrence of a word and though you would return a "requirement met" if there are only two occurrences of word parking. On the other hand you would miss the cases where parking and expensive both exists in a single record (your "count" would be 1 instead of 2).

LaurieF
Barite | Level 11

Yes. I almost completely agree. But this brings up one of the main problem with the puzzles we are set here, and one of the main problems I have in my career (touched on by "Know Thy Data"). Without a solid amount of data to test problems on, it's really easy to make assumptions.

 

In a perfect world, with lots of data, I would check the cardinality of the text field to check whether it was free-form text or not. If I could confirm that it was the case (very high cardinality), I would tailor my code to fit. With a small set of data, I made an assumption based on my understanding of the spec, and I would argue that I was correct, even if I wasn't right <grin>.

 

But that doesn't matter. I have (mostly silently) disagreed with some of the answers that people have given in the past, but the question setter has been happy so it is of no import. I think it is less likely that a single slanted interpretation of a specification is going to be accepted if there are extensive source data from which to formulate a solution. I blanch when I'm given test data which has been put together by a business analyst which may only have a hundred or a thousand observations in it. If it's a banking system, give me all the customers, all the accounts and all their transactions. It'll take me a wee bit longer to analyse it, but I will eventually come up with a solution which is more likely to last.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1589 views
  • 1 like
  • 3 in conversation