Hello,
My table1 looks like this:
LenderID | Year | Ind1 | Ind2 | Ind3 | Ind4 | Ind5 | Ind6 | Ind7 | Ind8 | Ind9 | Ind10 |
1 | 1995 | 0,2 | 0,3 | 0,1 | 0,05 | 0,05 | 0,025 | 0,025 | 0,07 | 0,08 | 0,1 |
2 | 1995 | 0,025 | 0,025 | 0,025 | 0,025 | 0,025 | 0,030 | 0,5 | 0,15 | 0,05 | 0,145 |
3 | 1995 | 0,07 | 0,08 | 0,09 | 0,1 | 0,11 | 0,12 | 0,13 | 0,14 | 0,15 | 0,01 |
4 | 1995 | 0,08 | 0,24 | 0,29 | 0,13 | 0,05 | 0,04 | 0,06 | 0,07 | 0,02 | 0,02 |
… | … | … | … | … | … | … | … | … | … | … | … |
The summation of Indi, (i=1, 2, ..., 10) egals 1 (100%)
I want that the five industries with the most weights be selected in the table2.
Table2 looks like this:
PackageID | LenderID | year | Ind1 | Ind2 | Ind3 | Ind4 | Ind5 | Ind6 | Ind7 | Ind8 | Ind9 | Ind10 |
1 | 56 | 1995 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 1995 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 2 | 1995 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 4 | 1995 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
2 | 6 | 1995 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
2 | 3 | 1995 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
2 | 108 | 1995 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
… | … | … | … | … | … | … | … | … | … | … | … | … |
I have in this table ten binary variables for each industry. For example, if the borrower's industry is Ind1 then Ind1 is 1, otherwise 0.
I have approximately 125 000 packageIDs in which one to fifty lenders (LenderID is the identification number) can be present. In total, 70 000 lenders are active (i.e. they participate frequently).
Table1 gives me the information about weights in each industry. I want keep lenderIDs in table2 that corresponds to their five industries with the most important industries' weights.in table1.
I also have 18 years so the years must match between the two tables.
For example, if we look at the packageID 1 with the information we see in table1, LenderID 1 can keep his place in table2, LenderID 2 must be eliminated and so on.
That seems complex to code. Thanks for your help.
You need to explain your question more clear. What is your output table ?
How to correspond to five inds from table1 to table2 ?
What is "their five industries with the most important industries' weights.in table1." ?
Thank for your response. I was not sure if it was clear enough.
My output would look like this table:
PackageID | LenderID | year | Ind1 | Ind2 | Ind3 | Ind4 | Ind5 | Ind6 | Ind7 | Ind8 | Ind9 | Ind10 | … |
1 | 56 | 1995 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … |
1 | 1 | 1995 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … |
2 | 3 | 1995 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | … |
2 | 4 | 1995 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | … |
… | … | … | … | … | … | … | … | … | … | … | … | … | … |
N.B. Hypothesis 1: The five most weignted industries for LenderID 56 contains Ind2.
Hypothesis 2: The ind8 is not present in the five most weighted industries for LenderIDs 6 and 108.
I specify because we don't see information about these LenderIDs in the table1.
In other words, the table 1 helps me to select rows (i.e. LenderID) in table 2. The output is the table2 with restrictions on which LenderIDs are still present.
In the case where, for example, a packageID includes three lenders (i. lenderIDs) in the fifth industry (i.e. Ind5) and none of these lenders have the industry 5 in their five most weighted (important) industries, the packageID wouldn't be present at all in the output.
Thank you in advance.
Question: What is the five industries with the most important industries weights in table1?
Response: It can vary from one lender to another. We must maximise the summation of indi=(1, 2, ..., 10) when we pick only five of ten industries.
I guess we must procede in two steps.
Step 1: Create a variable that results in the summation of five industries, the five industries which maximise the sum
Step 2: In a way, create five variables (I'm trying at this moment to figure out the least complicated way to procede. My thinking can be flawed.) which each identifies the five industries retain to maximize the summation.
I'm also thinking about other manners to modify my data for minimizing complicated manipulations. I have a huge database so it's not simple.
windyboo wrote:
Question: What is the five industries with the most important industries weights in table1?
I think this address the first question what are the top five industries per lenderid per year. I quess that's the question.
I forget to mention that I use an external file. Is the code following well adjusted for an external file?
data name_output;
infile 'external-file'; /* I don't need these options (dsd and dlm='09'x) if I use an external file (?)*/
input LenderID Year @;
do ind=1to10;
input w :commax. @;
output;
end; /* I don't need also the part with cards, I think*/
run;
There is an error because it creates an empty table (i.e. year, lenderid and w contain '' . '' everywhere).
Thank you for your help.
You need to show some of the records from the external file. Use
data _null_;
infile 'external-file' obs=20;
input;
list;
run;
and post the lines from the log with fixed font.
Ok. I have tried your code:
data _null_;
infile "C:\Users\eggl2000\Desktop\weights.csv" obs=20;
input;
list;
run;
Note: 20 records were read from the infile: "C:\Users\eggl2000\Desktop\weights.csv"
The minimum record length was 27.
The maximum record length was 74.
proc print; run;
LenderID | year | ind1 | ind2 | ind3 | ind4 | ind5 | ind6 | ind7 | ind8 | ind9 | ind10 | … |
36 | 2006 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | … |
36 | 2008 | 0,5 | 0 | 0 | 0 | 0 | 0 | 0 | 0,5 | 0 | 0 | … |
36 | 2009 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | … |
36 | 2011 | 0,5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0,5 | … |
36 | 2012 | 0 | 0 | 0,33333333 | 0 | 0 | 0 | 0,66666667 | 0 | 0 | 0 | … |
309 | 1997 | 0,25 | 0 | 0 | 0,25 | 0 | 0 | 0 | 0 | 0 | 0,5 | … |
309 | 1998 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | … |
… | … | … | … | … | … | … | … | … | … | … | … | … |
How the other codes are adapted to this change? We didn't have ''w'' anymore.
Thank you.
Question: How to correspond to five inds from table1 to table2?
Response: It's a very relevant question. It's my main problem, I guess.
The fact that my table1 expresses industries in terms of weights and my table2 expresses industries in terms of binary variables create a challenge in the process of coding. I must have an intermediate change to make with the data (for example, create other variables ...).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.