BookmarkSubscribeRSS Feed
Ramiro_iese
Calcite | Level 5

Hi! I have a problem transposing a matrix (this is just a subsample). I need in the columns the cpc, in the rows the wku and the count data as the datapoints. I am trying with this chunk but this is taking a lot of time 

 

data testwkucpc;
input wku count cpc $;
cards;

1 1 Y10T16/3819
1 1 B61C11/04
2 1 C02F1/00
3 1 G01B7/107
5 1 B23B41/04
5 1 B27F5/10
5 1 Y10T408/356
6 1 B27C5/06
7 1 Y10T408/488
7 1 E02B7/42
7 1 Y10S209/933
7 1 E04H17/263
8 1 E04H17/263
8 1 B25D1/16
8 1 B27C5/003
8 1 B27C7/005
;
run;

 

proc transpose data=testwkucpc out=testwkucpc1;
by wku;
id cpc;
var count;
run;

5 REPLIES 5
PaigeMiller
Diamond | Level 26

How many rows in the full data set? How many distinct values of CPC are there in the full data set?

 

Why do you need the variable COUNT transposed? What can you do with it in a transposed data set that you can't do in the un-transposed data set?

--
Paige Miller
Ramiro_iese
Calcite | Level 5

33 millons rows in the entire dataset

9 millons unique cpc 

Some wku has several cpc codes, then I need a matrix to compare by rows the cpc between wku

 

 

wku/cpc         A1     B1   C1

                1    1       0       0

                2    1       0       1 

                3     0      1       0

 

PaigeMiller
Diamond | Level 26

@Ramiro_iese wrote:

33 millons rows in the entire dataset

9 millons unique cpc 

 


Yes, I suspect that will take a very long time, and your computer might run out of resources.

 

Some wku has several cpc codes, then I need a matrix to compare by rows the cpc between wku

 

 

wku/cpc         A1     B1   C1

                1    1       0       0

                2    1       0       1 

                3     0      1       0

 

 

Please explain this further, and please explain what would you do with this matrix once you have it. Please explain the final goal of this analysis. I suspect the whole thing can be programmed without transposing, if only I knew where you are going.

--
Paige Miller
Ramiro_iese
Calcite | Level 5

Dear Page, thank you for your response. 

Okay. I am trying to take a look at the overlap between patents classification by firm.

wku is the patent number and cpc the patent classification. Most of the time (above 50% of the patents) has more than one cpc assigned. Then, in order to not let aside information, I need to take some similarity measures between patents and then group the patents by firm. 

Is it clear enough? Thank you again! 

PaigeMiller
Diamond | Level 26

@Ramiro_iese wrote:

Dear Page, thank you for your response. 

Okay. I am trying to take a look at the overlap between patents classification by firm.

wku is the patent number and cpc the patent classification. Most of the time (above 50% of the patents) has more than one cpc assigned. Then, in order to not let aside information, I need to take some similarity measures between patents and then group the patents by firm. 

Is it clear enough? Thank you again! 


Similarity measures? Such as ...

 

Firm?

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 383 views
  • 0 likes
  • 2 in conversation