Re: Select five of ten variables in the table1 that correspond to the ...

windyboo · Posted 05-17-2014 08:06 PM

Hello,

My table1 looks like this:

LenderID	Year	Ind1	Ind2	Ind3	Ind4	Ind5	Ind6	Ind7	Ind8	Ind9	Ind10
1	1995	0,2	0,3	0,1	0,05	0,05	0,025	0,025	0,07	0,08	0,1
2	1995	0,025	0,025	0,025	0,025	0,025	0,030	0,5	0,15	0,05	0,145
3	1995	0,07	0,08	0,09	0,1	0,11	0,12	0,13	0,14	0,15	0,01
4	1995	0,08	0,24	0,29	0,13	0,05	0,04	0,06	0,07	0,02	0,02
…	…	…	…	…	…	…	…	…	…	…	…

The summation of Indi, (i=1, 2, ..., 10) egals 1 (100%)

I want that the five industries with the most weights be selected in the table2.

Table2 looks like this:

PackageID	LenderID	year	Ind1	Ind2	Ind3	Ind4	Ind5	Ind6	Ind7	Ind8	Ind9	Ind10
1	56	1995	0	1	0	0	0	0	0	0	0	0
1	1	1995	0	1	0	0	0	0	0	0	0	0
1	2	1995	0	1	0	0	0	0	0	0	0	0
2	4	1995	0	0	0	0	0	0	0	1	0	0
2	6	1995	0	0	0	0	0	0	0	1	0	0
2	3	1995	0	0	0	0	0	0	0	1	0	0
2	108	1995	0	0	0	0	0	0	0	1	0	0
…	…	…	…	…	…	…	…	…	…	…	…	…

I have in this table ten binary variables for each industry. For example, if the borrower's industry is Ind1 then Ind1 is 1, otherwise 0.

I have approximately 125 000 packageIDs in which one to fifty lenders (LenderID is the identification number) can be present. In total, 70 000 lenders are active (i.e. they participate frequently).

Table1 gives me the information about weights in each industry. I want keep lenderIDs in table2 that corresponds to their five industries with the most important industries' weights.in table1.

I also have 18 years so the years must match between the two tables.

For example, if we look at the packageID 1 with the information we see in table1, LenderID 1 can keep his place in table2, LenderID 2 must be eliminated and so on.

That seems complex to code. Thanks for your help.

Ksharp · Posted 05-18-2014 03:25 AM

You need to explain your question more clear. What is your output table ?

How to correspond to five inds from table1 to table2 ?

What is "their five industries with the most important industries' weights.in table1." ?

windyboo · Posted 05-19-2014 10:14 AM

Thank for your response. I was not sure if it was clear enough.

My output would look like this table:

PackageID	LenderID	year	Ind1	Ind2	Ind3	Ind4	Ind5	Ind6	Ind7	Ind8	Ind9	Ind10	…
1	56	1995	0	1	0	0	0	0	0	0	0	0	…
1	1	1995	0	1	0	0	0	0	0	0	0	0	…
2	3	1995	0	0	0	0	0	0	0	1	0	0	…
2	4	1995	0	0	0	0	0	0	0	1	0	0	…
…	…	…	…	…	…	…	…	…	…	…	…	…	…

N.B. Hypothesis 1: The five most weignted industries for LenderID 56 contains Ind2.

Hypothesis 2: The ind8 is not present in the five most weighted industries for LenderIDs 6 and 108.

I specify because we don't see information about these LenderIDs in the table1.

In other words, the table 1 helps me to select rows (i.e. LenderID) in table 2. The output is the table2 with restrictions on which LenderIDs are still present.

In the case where, for example, a packageID includes three lenders (i. lenderIDs) in the fifth industry (i.e. Ind5) and none of these lenders have the industry 5 in their five most weighted (important) industries, the packageID wouldn't be present at all in the output.

Thank you in advance.

windyboo · Posted 05-19-2014 10:31 AM

Question: What is the five industries with the most important industries weights in table1?

Response: It can vary from one lender to another. We must maximise the summation of indi=(1, 2, ..., 10) when we pick only five of ten industries.

I guess we must procede in two steps.

Step 1: Create a variable that results in the summation of five industries, the five industries which maximise the sum

Step 2: In a way, create five variables (I'm trying at this moment to figure out the least complicated way to procede. My thinking can be flawed.) which each identifies the five industries retain to maximize the summation.

I'm also thinking about other manners to modify my data for minimizing complicated manipulations. I have a huge database so it's not simple.

data_null__ · Posted 05-19-2014 10:45 AM

windyboo wrote:

Question: What is the five industries with the most important industries weights in table1?

I think this address the first question what are the top five industries per lenderid per year. I quess that's the question.

data t1;
   infile cards dsd dlm='09'x; 
   input LenderID   Year @;
   do ind=1 to 10; 
      input w :commax. @;
      output; 
      end; 
   cards; 
1  1995  0,2   0,3   0,1   0,05  0,05  0,025 0,025 0,07  0,08  0,1
2  1995  0,025 0,025 0,025 0,025 0,025 0,030 0,5   0,15  0,05  0,145
3  1995  0,07  0,08  0,09  0,1   0,11  0,12  0,13  0,14  0,15  0,01
4  1995  0,08  0,24  0,29  0,13  0,05  0,04  0,06  0,07  0,02  0,02
;;;;
   run; 
proc print; 
   run; 
proc sort; 
   by lenderid year descending w;
   run; 
proc rank out=top5(where=(rw le 5)) descending; 
   by lenderid year;
   var w;
   ranks rw;
   run; 
proc print; 
   run;

windyboo · Posted 05-19-2014 01:31 PM

I forget to mention that I use an external file. Is the code following well adjusted for an external file?

data name_output;
   infile 'external-file';   /* I don't need these options (dsd and dlm='09'x) if I use an external file (?)*/
   input LenderID Year @;
   do ind=1to10;
      input w :commax. @;
      output;
      end; /* I don't need also the part with cards, I think*/

   run;

There is an error because it creates an empty table (i.e. year, lenderid and w contain '' . '' everywhere).

Thank you for your help.

data_null__ · Posted 05-19-2014 02:50 PM

You need to show some of the records from the external file. Use

data _null_;

infile 'external-file' obs=20;

input;

list;

run;

and post the lines from the log with fixed font.

windyboo · Posted 05-23-2014 10:59 AM

Ok. I have tried your code:

data _null_;

infile "C:\Users\eggl2000\Desktop\weights.csv" obs=20;

input;

list;
run;

Note: 20 records were read from the infile: "C:\Users\eggl2000\Desktop\weights.csv"

The minimum record length was 27.

The maximum record length was 74.

proc print; run;

(It seems to work. The code ''proc print'' gave me this output. It is my external final csv entirely. ) You must note that I want to choose the top 5 industries ( with the next codes) but many lenders lend to two or three industries maximum. In this case, the code must choose all industries when we count less than five.

LenderID	year	ind1	ind2	ind3	ind4	ind5	ind6	ind7	ind8	ind9	ind10	…
36	2006	0	0	0	0	0	0	1	0	0	0	…
36	2008	0,5	0	0	0	0	0	0	0,5	0	0	…
36	2009	0	0	0	0	0	0	0	0	0	1	…
36	2011	0,5	0	0	0	0	0	0	0	0	0,5	…
36	2012	0	0	0,33333333	0	0	0	0,66666667	0	0	0	…
309	1997	0,25	0	0	0,25	0	0	0	0	0	0,5	…
309	1998	0	0	0	0	0	0	0	0	0	1	…
…	…	…	…	…	…	…	…	…	…	…	…	…

How the other codes are adapted to this change? We didn't have ''w'' anymore.

Thank you.

windyboo · Posted 05-19-2014 10:37 AM

Question: How to correspond to five inds from table1 to table2?

Response: It's a very relevant question. It's my main problem, I guess.

The fact that my table1 expresses industries in terms of weights and my table2 expresses industries in terms of binary variables create a challenge in the process of coding. I must have an intermediate change to make with the data (for example, create other variables ...).

Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Re: Select five of ten variables in the table1 that correspond to the appropriate binary variables in table2

Registration is open

SAS Training: Just a Click Away