Solved: pick up the top 40% company of each CTRY group

Alexxxxxxx · Posted 10-09-2019 08:22 PM

Hello,

I would like to pick up the top 40% innovative company of each 'PERSON_CTRY_CODE'.

By using the example,

1. If the data obtained has a decimal point, then add one. For example, the AD country includes 14 companies, so the top40% is 5.6 companies. in this case, I would like to includes 6 companies rather than 5.

2. For ZA country, which includes 10 companies. The 4th company has 8116 innovations which is as same as the number of innovation as the 5th company. so I expect to obtain both of them.

3. I expect to have a Flag for picked company.

the result is expected to look like following.

PERSON_CTRY_CODE	company	sum_innovation	Seq	Top40%	Flag_Top40	note
AD	L'OREAL	7925	1		1
AD	VOLKSWAGEN	4909	2		1
AD	BAKER HUGHES	4546	3		1
AD	ABB (ASEA BROWN BOVERI)	2822	4		1
AD	CARL FREUDENBERG	1481	5		1
AD	ASTRA	587	6		0
AD	GRUENENTHAL	435	7		0
AD	KOLON INDUSTRIES	305	8		0
AD	BRITISH AMERICAN TOBACCO (INVESTMENTS)	233	9		0
AD	EVONIK ROEHM	221	10		0
AD	SEIKAGAKU CORPORATION	106	11		0
AD	BAYER ANIMAL HEALTH	23	12		0
AD	M-U-T MASCHINEN-UMWELTTECHNIK-TRANSPORTANLAGEN	15	13		0
AD	CONTROLLED THERAPEUTICS (SCOTLAND)	6	14	5.6	0	5.6=14*0.4, I would like to have 6 rather than 5
ZA	IBM (INTERNATIONAL BUSINESS MACHINES CORPORATION)	143820	1		1
ZA	SIEMENS	34929	2		1
ZA	TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)	14629	3		1
ZA	3M COMPANY (MINNESOTA MINING AND MANUFACTURING COMPANY)	8116	4		1
ZA	BAYER	8116	5		1	I would like to keep both of them if the 5th company has the same number of sum_innovation as the 4 th company.
ZA	YAZAKI CORPORATION	5307	6		0
ZA	ROHM & HAAS COMPANY	3816	7		0
ZA	UNILEVER	3380	8		0
ZA	L'AIR LIQUIDE SOCIETE ANONYME POUR L'ETUDE ET L'EXPLOITATION DES PROCEDES GEORGES CLAUDE	2403	9		0
ZA	COLGATE-PALMOLIVE COMPANY	1934	10	4	0

Could you please give me some suggestion about this?

Thanks in advance.

hashman · Posted 10-09-2019 10:47 PM

@Alexxxxxxx:

Since your input is already sorted by CODE desc SUM_INN, you can use a nested DoW-loop to count SUM_INN dupes as one:

data have ;                                                                                                                                                                                                                                                     
  input code $ comp $ sum_inn ;                                                                                                                                                                                                                                 
  cards ;                                                                                                                                                                                                                                                       
AD   LOREAL        7925                                                                                                                                                                                                                                         
AD   VW            4909                                                                                                                                                                                                                                         
AD   BAKER         4546                                                                                                                                                                                                                                         
AD   ABB           2822                                                                                                                                                                                                                                         
AD   CARL          1481                                                                                                                                                                                                                                         
AD   ASTRA          587                                                                                                                                                                                                                                         
AD   GRUEN          435                                                                                                                                                                                                                                         
AD   KOLON          305                                                                                                                                                                                                                                         
AD   BAT            233                                                                                                                                                                                                                                         
AD   EVONIK         221                                                                                                                                                                                                                                         
AD   SEIKA          106                                                                                                                                                                                                                                         
AD   BAYER           23                                                                                                                                                                                                                                         
AD   M-U-T           15                                                                                                                                                                                                                                         
AD   CTHERA           6                                                                                                                                                                                                                                         
ZA   IBM         143820                                                                                                                                                                                                                                         
ZA   SIEMENS      34929                                                                                                                                                                                                                                         
ZA   TELEFON      14629                                                                                                                                                                                                                                         
ZA   3MCOMP        8116                                                                                                                                                                                                                                         
ZA   BAYER         8116                                                                                                                                                                                                                                         
ZA   YAZAKI        5307                                                                                                                                                                                                                                         
ZA   ROHMHAAS      3816                                                                                                                                                                                                                                         
ZA   UNILEVER      3380                                                                                                                                                                                                                                         
ZA   LAIR          2403                                                                                                                                                                                                                                         
ZA   COLGATE       1934                                                                                                                                                                                                                                         
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want (drop = _:) ;                                                                                                                                                                                                                                         
  do _q = 1 by 1 until (last.code) ;                                                                                                                                                                                                                            
    do until (last.sum_inn) ;                                                                                                                                                                                                                                   
      set have ;                                                                                                                                                                                                                                                
      by code descending sum_inn ;                                                                                                                                                                                                                              
    end ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
  top40 = ceil (_q * 0.4) ;                                                                                                                                                                                                                                     
  do _n_ = 1 to _q ;                                                                                                                                                                                                                                            
    flag = _n_ <= top40 ;                                                                                                                                                                                                                                       
    do until (last.sum_inn) ;                                                                                                                                                                                                                                   
      set have ;                                                                                                                                                                                                                                                
      by code descending sum_inn ;                                                                                                                                                                                                                              
      output ;                                                                                                                                                                                                                                                  
    end ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
run ;

Kind regards

Paul D.

View solution in original post

hashman · Posted 10-09-2019 10:47 PM

@Alexxxxxxx:

Since your input is already sorted by CODE desc SUM_INN, you can use a nested DoW-loop to count SUM_INN dupes as one:

data have ;                                                                                                                                                                                                                                                     
  input code $ comp $ sum_inn ;                                                                                                                                                                                                                                 
  cards ;                                                                                                                                                                                                                                                       
AD   LOREAL        7925                                                                                                                                                                                                                                         
AD   VW            4909                                                                                                                                                                                                                                         
AD   BAKER         4546                                                                                                                                                                                                                                         
AD   ABB           2822                                                                                                                                                                                                                                         
AD   CARL          1481                                                                                                                                                                                                                                         
AD   ASTRA          587                                                                                                                                                                                                                                         
AD   GRUEN          435                                                                                                                                                                                                                                         
AD   KOLON          305                                                                                                                                                                                                                                         
AD   BAT            233                                                                                                                                                                                                                                         
AD   EVONIK         221                                                                                                                                                                                                                                         
AD   SEIKA          106                                                                                                                                                                                                                                         
AD   BAYER           23                                                                                                                                                                                                                                         
AD   M-U-T           15                                                                                                                                                                                                                                         
AD   CTHERA           6                                                                                                                                                                                                                                         
ZA   IBM         143820                                                                                                                                                                                                                                         
ZA   SIEMENS      34929                                                                                                                                                                                                                                         
ZA   TELEFON      14629                                                                                                                                                                                                                                         
ZA   3MCOMP        8116                                                                                                                                                                                                                                         
ZA   BAYER         8116                                                                                                                                                                                                                                         
ZA   YAZAKI        5307                                                                                                                                                                                                                                         
ZA   ROHMHAAS      3816                                                                                                                                                                                                                                         
ZA   UNILEVER      3380                                                                                                                                                                                                                                         
ZA   LAIR          2403                                                                                                                                                                                                                                         
ZA   COLGATE       1934                                                                                                                                                                                                                                         
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want (drop = _:) ;                                                                                                                                                                                                                                         
  do _q = 1 by 1 until (last.code) ;                                                                                                                                                                                                                            
    do until (last.sum_inn) ;                                                                                                                                                                                                                                   
      set have ;                                                                                                                                                                                                                                                
      by code descending sum_inn ;                                                                                                                                                                                                                              
    end ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
  top40 = ceil (_q * 0.4) ;                                                                                                                                                                                                                                     
  do _n_ = 1 to _q ;                                                                                                                                                                                                                                            
    flag = _n_ <= top40 ;                                                                                                                                                                                                                                       
    do until (last.sum_inn) ;                                                                                                                                                                                                                                   
      set have ;                                                                                                                                                                                                                                                
      by code descending sum_inn ;                                                                                                                                                                                                                              
      output ;                                                                                                                                                                                                                                                  
    end ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
run ;

Kind regards

Paul D.

FreelanceReinh · Posted 10-10-2019 07:44 AM

@hashman:

Hi Paul,

Interesting technique. I may be wrong, but my understanding was that the fifth record in code 'ZA' should be flagged only because its SUM_INN value is tied with the last observation satisfying the top-40 criterion. For example, if the first three records were tied as well, say

ZA   IBM          34929
ZA   SIEMENS      34929
ZA   TELEFON      34929

your code would count only _q=7 distinct SUM_INN values, compute top40=3 (rather than 4) and flag the first 6 observations (rather than 5) from code 'ZA', including 'YAZAKI' (SUM_INN=5307). So, YAZAKI's top-40 flag would be earned on the basis of accidental coincidences of numbers related to other companies.

Without nested loops this could be avoided:

data want (drop = _:) ;
  do _q = 1 by 1 until (last.code) ;
    set have ;
    by code descending sum_inn ;
  end ;
  _top40 = ceil (_q * 0.4) ;
  flag_top40 = 1 ;
  do _n_ = 1 to _q ;
    set have ;
    by code descending sum_inn ;
    output ;
    if _n_ >= _top40 & last.sum_inn then flag_top40 = 0 ;
  end ;
run ;

hashman · Posted 10-10-2019 01:30 PM

Hi @FreelanceReinh,

You may very well be right, in which case your variant does the real trick.

My understanding was that the OP wanted the ties to be counted as one but I can be wrong ... if the OP would like to clarify, I hope we'll see a note to this effect here.

Kind regards

Paul D.

pick up the top 40% company of each CTRY group

Re: pick up the top 40% company of each CTRY group

Re: pick up the top 40% company of each CTRY group

Re: pick up the top 40% company of each CTRY group

Re: pick up the top 40% company of each CTRY group

Catch up on SAS Innovate 2026

pick up the top 40% company of each CTRY group

Re: pick up the top 40% company of each CTRY group

Re: pick up the top 40% company of each CTRY group

Re: pick up the top 40% company of each CTRY group

Re: pick up the top 40% company of each CTRY group

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away