Calcite | Level 5

## Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Hello,

My table1 looks like this:

 LenderID Year Ind1 Ind2 Ind3 Ind4 Ind5 Ind6 Ind7 Ind8 Ind9 Ind10 1 1995 0,2 0,3 0,1 0,05 0,05 0,025 0,025 0,07 0,08 0,1 2 1995 0,025 0,025 0,025 0,025 0,025 0,030 0,5 0,15 0,05 0,145 3 1995 0,07 0,08 0,09 0,1 0,11 0,12 0,13 0,14 0,15 0,01 4 1995 0,08 0,24 0,29 0,13 0,05 0,04 0,06 0,07 0,02 0,02 … … … … … … … … … … … …

The summation of Indi, (i=1, 2, ..., 10) egals 1 (100%)

I want that the five industries with the most weights be selected in the table2.

Table2  looks like this:

 PackageID LenderID year Ind1 Ind2 Ind3 Ind4 Ind5 Ind6 Ind7 Ind8 Ind9 Ind10 1 56 1995 0 1 0 0 0 0 0 0 0 0 1 1 1995 0 1 0 0 0 0 0 0 0 0 1 2 1995 0 1 0 0 0 0 0 0 0 0 2 4 1995 0 0 0 0 0 0 0 1 0 0 2 6 1995 0 0 0 0 0 0 0 1 0 0 2 3 1995 0 0 0 0 0 0 0 1 0 0 2 108 1995 0 0 0 0 0 0 0 1 0 0 … … … … … … … … … … … … …

I have in this table ten binary variables for each industry. For example, if the borrower's industry is Ind1 then Ind1 is 1, otherwise 0.

I have approximately 125 000 packageIDs in which one to fifty lenders (LenderID is the identification number) can be present. In total, 70 000 lenders are active (i.e. they participate frequently).

Table1 gives me the information about  weights in each industry. I want keep lenderIDs in table2 that corresponds to their five industries with the most important industries' weights.in table1.

I also have 18 years so the years must match between the two tables.

For example, if we look at the packageID 1 with the information we see in table1, LenderID 1 can keep his place in table2, LenderID 2 must be eliminated and so on.

That seems complex to code. Thanks for your help.

8 REPLIES 8
Super User

## Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

You need to explain your question more clear. What is your output table ?

How to correspond to five inds from table1 to table2 ?

What is "their five industries with the most important industries' weights.in table1." ?

Calcite | Level 5

## Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Thank for your response. I was not sure if it was clear enough.

My output would look like this table:

 PackageID LenderID year Ind1 Ind2 Ind3 Ind4 Ind5 Ind6 Ind7 Ind8 Ind9 Ind10 … 1 56 1995 0 1 0 0 0 0 0 0 0 0 … 1 1 1995 0 1 0 0 0 0 0 0 0 0 … 2 3 1995 0 0 0 0 0 0 0 1 0 0 … 2 4 1995 0 0 0 0 0 0 0 1 0 0 … … … … … … … … … … … … … … …

N.B. Hypothesis 1: The five most weignted industries for LenderID 56 contains Ind2.

Hypothesis 2: The ind8 is not present in the five most weighted industries for LenderIDs 6 and 108.

I specify because we don't see information about these LenderIDs in the table1.

In other words, the table 1 helps me to select rows (i.e. LenderID)  in table 2. The output is the table2 with restrictions on which LenderIDs are still present.

In the case where, for example, a packageID includes three lenders (i. lenderIDs) in the fifth industry (i.e. Ind5) and none of these lenders have the industry 5 in their five most weighted (important) industries, the packageID wouldn't be present at all in the output.

Calcite | Level 5

## Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Question: What is the five industries with the most important industries weights in table1?

Response: It can vary from one lender to another. We must maximise the summation of indi=(1, 2, ..., 10) when we pick only five of ten industries.

I guess we must procede in two steps.

Step 1: Create a variable that results in the summation of five industries, the five industries which maximise the sum

Step 2: In a way, create five variables (I'm trying at this moment to figure out the least complicated way to procede. My thinking can be flawed.) which each identifies the five industries retain to maximize the summation.

I'm also thinking about other manners to modify my data for minimizing complicated manipulations. I have a huge database so it's not simple.

## Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

```windyboo wrote:

Question: What is the five industries with the most important industries weights in table1?

```

I think this address the first question what are the top five industries per lenderid per year.  I quess that's the question.

data t1;
infile cards dsd dlm='09'x;

input LenderID   Year @;
do ind=1 to 10;

input w :commax. @;
output;

end;

cards;
1  1995  0,2   0,3   0,1   0,05  0,05  0,025 0,025 0,07  0,08  0,1
2  1995  0,025 0,025 0,025 0,025 0,025 0,030 0,5   0,15  0,05  0,145
3  1995  0,07  0,08  0,09  0,1   0,11  0,12  0,13  0,14  0,15  0,01
4  1995  0,08  0,24  0,29  0,13  0,05  0,04  0,06  0,07  0,02  0,02
;;;;
run;
proc print;

run;
proc sort;

by lenderid year descending w;
run;
proc rank out=top5(where=(rw le 5)) descending;

by lenderid year;
var w;
ranks rw;
run;
proc print;

run;
Calcite | Level 5

## Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

I forget to mention that I use an external file. Is the code following well adjusted for an external file?

data name_output;
infile 'external-file';   /* I don't need these options (dsd and dlm='09'x) if I use an external file (?)*/

input LenderID  Year @;
do ind=1to10;

input w :commax. @;
output;

end; /* I don't need also the part with cards, I think*/

run;

There is an error because it creates an empty table (i.e. year, lenderid and w contain '' . '' everywhere).

## Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

You need to show some of the records from the external file.  Use

data _null_;

infile 'external-file' obs=20;

input;

list;

run;

and post the lines from the log with fixed font.

Calcite | Level 5

## Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Ok. I have tried your code:

data _null_;

infile "C:\Users\eggl2000\Desktop\weights.csv" obs=20;

input;

list;
run;

Note: 20 records were read from the infile: "C:\Users\eggl2000\Desktop\weights.csv"

The minimum record length was 27.

The maximum record length was 74.

proc print; run;

(It seems to work. The code ''proc print'' gave me this output. It is my external final csv entirely. ) You must note that I want to choose the top 5 industries ( with the next codes) but many lenders lend to two or three industries maximum. In this case, the code must choose all industries when we count less than five.

LenderID

yearind1ind2ind3ind4ind5ind6ind7ind8ind9ind10
3620060000001000
3620080,50000000,500
3620090000000001
3620110,5000000000,5
362012000,333333330000,66666667000
30919970,25000,25000000,5
30919980000000001

How the other codes are adapted to this change? We didn't have ''w'' anymore.

Thank you.

Calcite | Level 5

## Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Question: How to correspond to five inds from table1 to table2?

Response: It's a very relevant question. It's my main problem, I guess.

The fact that my table1 expresses industries in terms of weights and my table2 expresses industries in terms of binary variables create a challenge in the process of coding. I must have an intermediate change to make with the data (for example, create other variables ...).

Discussion stats
• 8 replies
• 900 views
• 3 likes
• 3 in conversation