Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Reply
Contributor
Posts: 62

Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Hello,

My table1 looks like this:

LenderIDYearInd1Ind2Ind3Ind4Ind5Ind6Ind7Ind8Ind9Ind10
119950,20,30,10,050,050,0250,0250,070,080,1
219950,0250,0250,0250,0250,0250,0300,50,150,050,145
319950,070,080,090,10,110,120,130,140,150,01
419950,080,240,290,130,050,040,060,070,020,02

The summation of Indi, (i=1, 2, ..., 10) egals 1 (100%)

I want that the five industries with the most weights be selected in the table2.

Table2  looks like this:

PackageIDLenderIDyearInd1Ind2Ind3Ind4Ind5Ind6Ind7Ind8Ind9Ind10
15619950100000000
1119950100000000
1219950100000000
2419950000000100
2619950000000100
2319950000000100
210819950000000100

I have in this table ten binary variables for each industry. For example, if the borrower's industry is Ind1 then Ind1 is 1, otherwise 0.

I have approximately 125 000 packageIDs in which one to fifty lenders (LenderID is the identification number) can be present. In total, 70 000 lenders are active (i.e. they participate frequently).

Table1 gives me the information about  weights in each industry. I want keep lenderIDs in table2 that corresponds to their five industries with the most important industries' weights.in table1.

I also have 18 years so the years must match between the two tables.

For example, if we look at the packageID 1 with the information we see in table1, LenderID 1 can keep his place in table2, LenderID 2 must be eliminated and so on.

That seems complex to code. Thanks for your help.

Super User
Posts: 9,682

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

You need to explain your question more clear. What is your output table ?

How to correspond to five inds from table1 to table2 ?

What is "their five industries with the most important industries' weights.in table1." ?

Contributor
Posts: 62

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Thank for your response. I was not sure if it was clear enough.

My output would look like this table:

 
PackageIDLenderIDyearInd1Ind2Ind3Ind4Ind5Ind6Ind7Ind8Ind9Ind10
15619950100000000
1119950100000000
2319950000000100
2419950000000100

N.B. Hypothesis 1: The five most weignted industries for LenderID 56 contains Ind2.

Hypothesis 2: The ind8 is not present in the five most weighted industries for LenderIDs 6 and 108.

I specify because we don't see information about these LenderIDs in the table1.

In other words, the table 1 helps me to select rows (i.e. LenderID)  in table 2. The output is the table2 with restrictions on which LenderIDs are still present.

In the case where, for example, a packageID includes three lenders (i. lenderIDs) in the fifth industry (i.e. Ind5) and none of these lenders have the industry 5 in their five most weighted (important) industries, the packageID wouldn't be present at all in the output.

Thank you in advance.

Contributor
Posts: 62

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Question: What is the five industries with the most important industries weights in table1?

Response: It can vary from one lender to another. We must maximise the summation of indi=(1, 2, ..., 10) when we pick only five of ten industries.

I guess we must procede in two steps.

Step 1: Create a variable that results in the summation of five industries, the five industries which maximise the sum

Step 2: In a way, create five variables (I'm trying at this moment to figure out the least complicated way to procede. My thinking can be flawed.) which each identifies the five industries retain to maximize the summation.

I'm also thinking about other manners to modify my data for minimizing complicated manipulations. I have a huge database so it's not simple.

Respected Advisor
Posts: 3,777

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

windyboo wrote:

Question: What is the five industries with the most important industries weights in table1?

I think this address the first question what are the top five industries per lenderid per year.  I quess that's the question. Smiley Happy

data t1;
   infile cards dsd dlm='09'x;
  
input LenderID   Year @;
   do ind=1 to 10;
     
input w :commax. @;
      output;
     
end;
  
cards;
1  1995  0,2   0,3   0,1   0,05  0,05  0,025 0,025 0,07  0,08  0,1
2  1995  0,025 0,025 0,025 0,025 0,025 0,030 0,5   0,15  0,05  0,145
3  1995  0,07  0,08  0,09  0,1   0,11  0,12  0,13  0,14  0,15  0,01
4  1995  0,08  0,24  0,29  0,13  0,05  0,04  0,06  0,07  0,02  0,02
;;;;
   run;
proc print;
  
run;
proc sort;
  
by lenderid year descending w;
   run;
proc rank out=top5(where=(rw le 5)) descending;
  
by lenderid year;
   var w;
   ranks rw;
   run;
proc print;
  
run;
Contributor
Posts: 62

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

I forget to mention that I use an external file. Is the code following well adjusted for an external file?

data name_output;
   infile 'external-file';   /* I don't need these options (dsd and dlm='09'x) if I use an external file (?)*/
  
input LenderID  Year @;
   do ind=1to10;
     
input w :commax. @;
      output;
     
end; /* I don't need also the part with cards, I think*/

   run;

There is an error because it creates an empty table (i.e. year, lenderid and w contain '' . '' everywhere).

Thank you for your help.

Respected Advisor
Posts: 3,777

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

You need to show some of the records from the external file.  Use

data _null_;

  infile 'external-file' obs=20;

  input;

  list;

  run;

and post the lines from the log with fixed font.

Contributor
Posts: 62

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Ok. I have tried your code:

data _null_;

infile "C:\Users\eggl2000\Desktop\weights.csv" obs=20;

input;

list;
run;

Note: 20 records were read from the infile: "C:\Users\eggl2000\Desktop\weights.csv"

The minimum record length was 27.

The maximum record length was 74.

proc print; run;

(It seems to work. The code ''proc print'' gave me this output. It is my external final csv entirely. ) You must note that I want to choose the top 5 industries ( with the next codes) but many lenders lend to two or three industries maximum. In this case, the code must choose all industries when we count less than five. 

LenderID

yearind1ind2ind3ind4ind5ind6ind7ind8ind9ind10
3620060000001000
3620080,50000000,500
3620090000000001
3620110,5000000000,5
362012000,333333330000,66666667000
30919970,25000,25000000,5
30919980000000001

How the other codes are adapted to this change? We didn't have ''w'' anymore.

Thank you.

Contributor
Posts: 62

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Question: How to correspond to five inds from table1 to table2?

Response: It's a very relevant question. It's my main problem, I guess.

The fact that my table1 expresses industries in terms of weights and my table2 expresses industries in terms of binary variables create a challenge in the process of coding. I must have an intermediate change to make with the data (for example, create other variables ...).

Ask a Question
Discussion stats
  • 8 replies
  • 332 views
  • 3 likes
  • 3 in conversation