Hello,
I would like to pick up the top 40% innovative company of each 'PERSON_CTRY_CODE'.
By using the example,
1. If the data obtained has a decimal point, then add one. For example, the AD country includes 14 companies, so the top40% is 5.6 companies. in this case, I would like to includes 6 companies rather than 5.
2. For ZA country, which includes 10 companies. The 4th company has 8116 innovations which is as same as the number of innovation as the 5th company. so I expect to obtain both of them.
3. I expect to have a Flag for picked company.
the result is expected to look like following.
PERSON_CTRY_CODE | company | sum_innovation | Seq | Top40% | Flag_Top40 | note |
AD | L'OREAL | 7925 | 1 | 1 | ||
AD | VOLKSWAGEN | 4909 | 2 | 1 | ||
AD | BAKER HUGHES | 4546 | 3 | 1 | ||
AD | ABB (ASEA BROWN BOVERI) | 2822 | 4 | 1 | ||
AD | CARL FREUDENBERG | 1481 | 5 | 1 | ||
AD | ASTRA | 587 | 6 | 0 | ||
AD | GRUENENTHAL | 435 | 7 | 0 | ||
AD | KOLON INDUSTRIES | 305 | 8 | 0 | ||
AD | BRITISH AMERICAN TOBACCO (INVESTMENTS) | 233 | 9 | 0 | ||
AD | EVONIK ROEHM | 221 | 10 | 0 | ||
AD | SEIKAGAKU CORPORATION | 106 | 11 | 0 | ||
AD | BAYER ANIMAL HEALTH | 23 | 12 | 0 | ||
AD | M-U-T MASCHINEN-UMWELTTECHNIK-TRANSPORTANLAGEN | 15 | 13 | 0 | ||
AD | CONTROLLED THERAPEUTICS (SCOTLAND) | 6 | 14 | 5.6 | 0 | 5.6=14*0.4, I would like to have 6 rather than 5 |
ZA | IBM (INTERNATIONAL BUSINESS MACHINES CORPORATION) | 143820 | 1 | 1 | ||
ZA | SIEMENS | 34929 | 2 | 1 | ||
ZA | TELEFONAKTIEBOLAGET LM ERICSSON (PUBL) | 14629 | 3 | 1 | ||
ZA | 3M COMPANY (MINNESOTA MINING AND MANUFACTURING COMPANY) | 8116 | 4 | 1 | ||
ZA | BAYER | 8116 | 5 | 1 | I would like to keep both of them if the 5th company has the same number of sum_innovation as the 4 th company. | |
ZA | YAZAKI CORPORATION | 5307 | 6 | 0 | ||
ZA | ROHM & HAAS COMPANY | 3816 | 7 | 0 | ||
ZA | UNILEVER | 3380 | 8 | 0 | ||
ZA | L'AIR LIQUIDE SOCIETE ANONYME POUR L'ETUDE ET L'EXPLOITATION DES PROCEDES GEORGES CLAUDE | 2403 | 9 | 0 | ||
ZA | COLGATE-PALMOLIVE COMPANY | 1934 | 10 | 4 | 0 |
Could you please give me some suggestion about this?
Thanks in advance.
Since your input is already sorted by CODE desc SUM_INN, you can use a nested DoW-loop to count SUM_INN dupes as one:
data have ;
input code $ comp $ sum_inn ;
cards ;
AD LOREAL 7925
AD VW 4909
AD BAKER 4546
AD ABB 2822
AD CARL 1481
AD ASTRA 587
AD GRUEN 435
AD KOLON 305
AD BAT 233
AD EVONIK 221
AD SEIKA 106
AD BAYER 23
AD M-U-T 15
AD CTHERA 6
ZA IBM 143820
ZA SIEMENS 34929
ZA TELEFON 14629
ZA 3MCOMP 8116
ZA BAYER 8116
ZA YAZAKI 5307
ZA ROHMHAAS 3816
ZA UNILEVER 3380
ZA LAIR 2403
ZA COLGATE 1934
;
run ;
data want (drop = _:) ;
do _q = 1 by 1 until (last.code) ;
do until (last.sum_inn) ;
set have ;
by code descending sum_inn ;
end ;
end ;
top40 = ceil (_q * 0.4) ;
do _n_ = 1 to _q ;
flag = _n_ <= top40 ;
do until (last.sum_inn) ;
set have ;
by code descending sum_inn ;
output ;
end ;
end ;
run ;
Kind regards
Paul D.
Since your input is already sorted by CODE desc SUM_INN, you can use a nested DoW-loop to count SUM_INN dupes as one:
data have ;
input code $ comp $ sum_inn ;
cards ;
AD LOREAL 7925
AD VW 4909
AD BAKER 4546
AD ABB 2822
AD CARL 1481
AD ASTRA 587
AD GRUEN 435
AD KOLON 305
AD BAT 233
AD EVONIK 221
AD SEIKA 106
AD BAYER 23
AD M-U-T 15
AD CTHERA 6
ZA IBM 143820
ZA SIEMENS 34929
ZA TELEFON 14629
ZA 3MCOMP 8116
ZA BAYER 8116
ZA YAZAKI 5307
ZA ROHMHAAS 3816
ZA UNILEVER 3380
ZA LAIR 2403
ZA COLGATE 1934
;
run ;
data want (drop = _:) ;
do _q = 1 by 1 until (last.code) ;
do until (last.sum_inn) ;
set have ;
by code descending sum_inn ;
end ;
end ;
top40 = ceil (_q * 0.4) ;
do _n_ = 1 to _q ;
flag = _n_ <= top40 ;
do until (last.sum_inn) ;
set have ;
by code descending sum_inn ;
output ;
end ;
end ;
run ;
Kind regards
Paul D.
Hi Paul,
Interesting technique. I may be wrong, but my understanding was that the fifth record in code 'ZA' should be flagged only because its SUM_INN value is tied with the last observation satisfying the top-40 criterion. For example, if the first three records were tied as well, say
ZA IBM 34929 ZA SIEMENS 34929 ZA TELEFON 34929
your code would count only _q=7 distinct SUM_INN values, compute top40=3 (rather than 4) and flag the first 6 observations (rather than 5) from code 'ZA', including 'YAZAKI' (SUM_INN=5307). So, YAZAKI's top-40 flag would be earned on the basis of accidental coincidences of numbers related to other companies.
Without nested loops this could be avoided:
data want (drop = _:) ;
do _q = 1 by 1 until (last.code) ;
set have ;
by code descending sum_inn ;
end ;
_top40 = ceil (_q * 0.4) ;
flag_top40 = 1 ;
do _n_ = 1 to _q ;
set have ;
by code descending sum_inn ;
output ;
if _n_ >= _top40 & last.sum_inn then flag_top40 = 0 ;
end ;
run ;
Hi @FreelanceReinh,
You may very well be right, in which case your variant does the real trick.
My understanding was that the OP wanted the ties to be counted as one but I can be wrong ... if the OP would like to clarify, I hope we'll see a note to this effect here.
Kind regards
Paul D.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.