BookmarkSubscribeRSS Feed
windyboo
Calcite | Level 5

Hello,

My table1 looks like this:

LenderIDYearInd1Ind2Ind3Ind4Ind5Ind6Ind7Ind8Ind9Ind10
119950,20,30,10,050,050,0250,0250,070,080,1
219950,0250,0250,0250,0250,0250,0300,50,150,050,145
319950,070,080,090,10,110,120,130,140,150,01
419950,080,240,290,130,050,040,060,070,020,02

The summation of Indi, (i=1, 2, ..., 10) egals 1 (100%)

I want that the five industries with the most weights be selected in the table2.

Table2  looks like this:

PackageIDLenderIDyearInd1Ind2Ind3Ind4Ind5Ind6Ind7Ind8Ind9Ind10
15619950100000000
1119950100000000
1219950100000000
2419950000000100
2619950000000100
2319950000000100
210819950000000100

I have in this table ten binary variables for each industry. For example, if the borrower's industry is Ind1 then Ind1 is 1, otherwise 0.

I have approximately 125 000 packageIDs in which one to fifty lenders (LenderID is the identification number) can be present. In total, 70 000 lenders are active (i.e. they participate frequently).

Table1 gives me the information about  weights in each industry. I want keep lenderIDs in table2 that corresponds to their five industries with the most important industries' weights.in table1.

I also have 18 years so the years must match between the two tables.

For example, if we look at the packageID 1 with the information we see in table1, LenderID 1 can keep his place in table2, LenderID 2 must be eliminated and so on.

That seems complex to code. Thanks for your help.

8 REPLIES 8
Ksharp
Super User

You need to explain your question more clear. What is your output table ?

How to correspond to five inds from table1 to table2 ?

What is "their five industries with the most important industries' weights.in table1." ?

windyboo
Calcite | Level 5

Thank for your response. I was not sure if it was clear enough.

My output would look like this table:

 
PackageIDLenderIDyearInd1Ind2Ind3Ind4Ind5Ind6Ind7Ind8Ind9Ind10
15619950100000000
1119950100000000
2319950000000100
2419950000000100

N.B. Hypothesis 1: The five most weignted industries for LenderID 56 contains Ind2.

Hypothesis 2: The ind8 is not present in the five most weighted industries for LenderIDs 6 and 108.

I specify because we don't see information about these LenderIDs in the table1.

In other words, the table 1 helps me to select rows (i.e. LenderID)  in table 2. The output is the table2 with restrictions on which LenderIDs are still present.

In the case where, for example, a packageID includes three lenders (i. lenderIDs) in the fifth industry (i.e. Ind5) and none of these lenders have the industry 5 in their five most weighted (important) industries, the packageID wouldn't be present at all in the output.

Thank you in advance.

windyboo
Calcite | Level 5

Question: What is the five industries with the most important industries weights in table1?

Response: It can vary from one lender to another. We must maximise the summation of indi=(1, 2, ..., 10) when we pick only five of ten industries.

I guess we must procede in two steps.

Step 1: Create a variable that results in the summation of five industries, the five industries which maximise the sum

Step 2: In a way, create five variables (I'm trying at this moment to figure out the least complicated way to procede. My thinking can be flawed.) which each identifies the five industries retain to maximize the summation.

I'm also thinking about other manners to modify my data for minimizing complicated manipulations. I have a huge database so it's not simple.

data_null__
Jade | Level 19

windyboo wrote:

Question: What is the five industries with the most important industries weights in table1?

I think this address the first question what are the top five industries per lenderid per year.  I quess that's the question. Smiley Happy

data t1;
   infile cards dsd dlm='09'x;
  
input LenderID   Year @;
   do ind=1 to 10;
     
input w :commax. @;
      output;
     
end;
  
cards;
1  1995  0,2   0,3   0,1   0,05  0,05  0,025 0,025 0,07  0,08  0,1
2  1995  0,025 0,025 0,025 0,025 0,025 0,030 0,5   0,15  0,05  0,145
3  1995  0,07  0,08  0,09  0,1   0,11  0,12  0,13  0,14  0,15  0,01
4  1995  0,08  0,24  0,29  0,13  0,05  0,04  0,06  0,07  0,02  0,02
;;;;
   run;
proc print;
  
run;
proc sort;
  
by lenderid year descending w;
   run;
proc rank out=top5(where=(rw le 5)) descending;
  
by lenderid year;
   var w;
   ranks rw;
   run;
proc print;
  
run;
windyboo
Calcite | Level 5

I forget to mention that I use an external file. Is the code following well adjusted for an external file?

data name_output;
   infile 'external-file';   /* I don't need these options (dsd and dlm='09'x) if I use an external file (?)*/
  
input LenderID  Year @;
   do ind=1to10;
     
input w :commax. @;
      output;
     
end; /* I don't need also the part with cards, I think*/

   run;

There is an error because it creates an empty table (i.e. year, lenderid and w contain '' . '' everywhere).

Thank you for your help.

data_null__
Jade | Level 19

You need to show some of the records from the external file.  Use

data _null_;

  infile 'external-file' obs=20;

  input;

  list;

  run;

and post the lines from the log with fixed font.

windyboo
Calcite | Level 5

Ok. I have tried your code:

data _null_;

infile "C:\Users\eggl2000\Desktop\weights.csv" obs=20;

input;

list;
run;

Note: 20 records were read from the infile: "C:\Users\eggl2000\Desktop\weights.csv"

The minimum record length was 27.

The maximum record length was 74.

proc print; run;

(It seems to work. The code ''proc print'' gave me this output. It is my external final csv entirely. ) You must note that I want to choose the top 5 industries ( with the next codes) but many lenders lend to two or three industries maximum. In this case, the code must choose all industries when we count less than five. 

LenderID

yearind1ind2ind3ind4ind5ind6ind7ind8ind9ind10
3620060000001000
3620080,50000000,500
3620090000000001
3620110,5000000000,5
362012000,333333330000,66666667000
30919970,25000,25000000,5
30919980000000001

How the other codes are adapted to this change? We didn't have ''w'' anymore.

Thank you.

windyboo
Calcite | Level 5

Question: How to correspond to five inds from table1 to table2?

Response: It's a very relevant question. It's my main problem, I guess.

The fact that my table1 expresses industries in terms of weights and my table2 expresses industries in terms of binary variables create a challenge in the process of coding. I must have an intermediate change to make with the data (for example, create other variables ...).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 650 views
  • 3 likes
  • 3 in conversation