BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AndyZ
Calcite | Level 5

hi guys,

I have a large dataset which contains reject reasons. rejection reasons are order, kind like the waterfall.

 

I want to count how many ppl are rejected by each rejection code. For example, customer b is rejected by "low+bro", because in the order, bro is the first reason, so b will be ONLY counted to bro, not low. 

raw   what I am looking for is
       
namerejection rejection_by_order rejection_reasoncount
alow bro bro3
blow+bro eas eas1
cbro low low1
deas+bro     
eeas, low     
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Sounds like you have data like this:

data have;
  input name :$10. rejection $30. ;
cards;
a low
b low+bro
c bro
d eas+bro          
e eas, low 
;

And metadata like this:

data reasons;
  order+1;
  input reason $10.;
cards; 
bro
eas
low
;

So first combine them and sort by NAME and ORDER.  Then take the first one per name.

proc sql;
create table step1 as
select a.*,b.*
from have a 
left join reasons b
on indexw(a.rejection,b.reason,' ,+')
order by name, order 
;
quit;

data want;
  set step1;
  by name;
  if first.name;
run;

proc freq data=want;
  tables reason;
run;

Results:

The FREQ Procedure

                                   Cumulative    Cumulative
reason    Frequency     Percent     Frequency      Percent
-----------------------------------------------------------
bro              3       60.00             3        60.00
eas              1       20.00             4        80.00
low              1       20.00             5       100.00

 

PS That is an extremely strange way to type in data.  Did you enter it into a spreadsheet?  Why not just enter it as plain text?

View solution in original post

7 REPLIES 7
Reeza
Super User
Since bro is before eas shouldn't it be bro=4, eas=1, low=0?
AndyZ
Calcite | Level 5

bro contains b, c , d

Reeza
Super User
data have;
	infile cards dlm='|' truncover;
	input name	$ rejection	 $50.;
	cards;
a |low	
b |low+bro 	
c |bro	
d |eas+bro	 	 	 	 	 
e |eas,low
;
run;


*create informat for ordering rejections;
proc format;
	invalue $ rej_infmt 'bro'=1 'eas'=2 'low'=3;
run;

*split data out for analysis;
data long;
	set have;
	*count number of terms in rejection;
	nWords=countw(compress(rejection, , 's'));

	do rej=1 to nwords;
	    *capture one rejection reason at a time;
		rejection_reason=compress(scan(rejection, rej, "+,"), , 's');
		*convert rejection_reseason to a number for ordering;
		rejection_code=input(rejection_reason, $rej_infmt.);
		*output to data set;
		output;
	end;
run;

*sort so that lowest is first;
proc sort data=long;
     by name rejection_code;
run;

*keep only first using nodupkey;
proc sort data=long out=lowest nodupkey;
by name;
run;

*summarize results;
proc freq data=lowest;
	table rejection_code * rejection_reason / list;
run;

Longer solution but dynamic, easy to fix and modify if necessary. 

Tom
Super User Tom
Super User

Sounds like you have data like this:

data have;
  input name :$10. rejection $30. ;
cards;
a low
b low+bro
c bro
d eas+bro          
e eas, low 
;

And metadata like this:

data reasons;
  order+1;
  input reason $10.;
cards; 
bro
eas
low
;

So first combine them and sort by NAME and ORDER.  Then take the first one per name.

proc sql;
create table step1 as
select a.*,b.*
from have a 
left join reasons b
on indexw(a.rejection,b.reason,' ,+')
order by name, order 
;
quit;

data want;
  set step1;
  by name;
  if first.name;
run;

proc freq data=want;
  tables reason;
run;

Results:

The FREQ Procedure

                                   Cumulative    Cumulative
reason    Frequency     Percent     Frequency      Percent
-----------------------------------------------------------
bro              3       60.00             3        60.00
eas              1       20.00             4        80.00
low              1       20.00             5       100.00

 

PS That is an extremely strange way to type in data.  Did you enter it into a spreadsheet?  Why not just enter it as plain text?

AndyZ
Calcite | Level 5

is it possible to use scan or find function?

Tom
Super User Tom
Super User

@AndyZ wrote:

is it possible to use scan or find function?


SCAN() would not work very well as it will just pick a single reason from the list.  You could possible do some complex looping to try to get it to help but you would need to need to keep track of which is maximum order of the reasons found.  The simplest way would be to use SCAN() to convert the HAVE dataset into one observation per NAME/REASON instead of having multiple reasons stuffed into a single variable.  Then joining it with the metadata that sets the order would be easier.  You might even use a INFORMAT to convert the text of the reason pulled out into an order number to make sorting easier.

 

FIND() has the problem that in the general case some text of a reason might be a subset of another reason.  That is why INDEXW() or FINDW() is better since it only searches for full words, not just substrings.

 

 

andreas_lds
Jade | Level 19

Why? The code will be much more complicated than the already posted solution.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 429 views
  • 2 likes
  • 4 in conversation