BookmarkSubscribeRSS Feed
jbighitter
Fluorite | Level 6

I need to transpose a very large dataset by three variables.  Here's the code I'm using

 

data want; set mstar.bd_membership; if Companyid="0C000006SV" or CompanyID="0C000006SP";

proc print; run;

proc sort;
by personid fiscalyear committeeid;
run;

proc transpose data=want out=need;
by personid fiscalyear committeeid;
var value;
id data_bd;
run;

Here's a snapshot of the data

 

Report Fiscal Committee Data

                            Fiscal
Obs CompanyID Year RYear ID        PersonID          ID     Value       data_bd

1 0C0000AL9W 2015 2014 AUDIT PS000000RW 60101 2015-04-27 60101
2 0C0000AL9W 2015 2014 AUDIT PS000000RW 60102 DEF 14A 60102
3 0C0000AL9W 2015 2014 AUDIT PS000000RW 60117 USD 60117
4 0C0000AL9W 2015 2014 AUDIT PS000000RW 60167 0 60167
5 0C0000AL9W 2015 2014 AUDIT PS000000RW 60170 6000.00 60170
6 0C0000AL9W 2015 2014 AUDIT PS000000RW 60171 21000.00 60171
7 0C0000AL9W 2015 2014 BOARD PS000000RW 60101 2015-04-27 60101
8 0C0000AL9W 2015 2014 BOARD PS000000RW 60102 DEF 14A 60102
9 0C0000AL9W 2015 2014 BOARD PS000000RW 60117 USD 60117
10 0C0000AL9W 2015 2014 BOARD PS000000RW 60167 0 60167
11 0C0000AL9W 2015 2014 BOARD PS000000RW 60170 30000.00 60170
12 0C0000AL9W 2015 2014 BOARD PS000000RW 60171 40000.00 60171
13 0C0000AL9W 2015 2014 BOARD PS000000RW 60176 4 60176
14 0C0000AL9W 2015 2014 COMPE PS000000RW 60101 2015-04-27 60101
15 0C0000AL9W 2015 2014 COMPE PS000000RW 60102 DEF 14A 60102
16 0C0000AL9W 2015 2014 COMPE PS000000RW 60117 USD 60117
17 0C0000AL9W 2015 2014 COMPE PS000000RW 60167 1 60167
18 0C0000AL9W 2015 2014 COMPE PS000000RW 60170 4000.00 60170
19 0C0000AL9W 2015 2014 COMPE PS000000RW 60171 14000.00 60171
20 0C0000AL9W 2016 2015 AUDIT PS000000RW 60101 2016-04-19 60101
21 0C0000AL9W 2016 2015 AUDIT PS000000RW 60102 DEF 14A 60102
22 0C0000AL9W 2016 2015 AUDIT PS000000RW 60117 USD 60117
23 0C0000AL9W 2016 2015 AUDIT PS000000RW 60167 0 60167
24 0C0000AL9W 2016 2015 AUDIT PS000000RW 60170 6000.00 60170
25 0C0000AL9W 2016 2015 AUDIT PS000000RW 60171 15000.00 60171

 

When I run this for two companies it works fine.  However, when I run it for 9000+ companies I get an error message in the log that reads 

 

ERROR: The ID value "_60101" occurs twice in the same BY group.
ERROR: The ID value "_60102" occurs twice in the same BY group.
ERROR: The ID value "_60117" occurs twice in the same BY group.
ERROR: The ID value "_60167" occurs twice in the same BY group.
ERROR: The ID value "_60170" occurs twice in the same BY group.
ERROR: The ID value "_60171" occurs twice in the same BY group.
ERROR: The ID value "_60101" occurs twice in the same BY group.
ERROR: The ID value "_60102" occurs twice in the same BY group.
ERROR: The ID value "_60117" occurs twice in the same BY group.
ERROR: The ID value "_60167" occurs twice in the same BY group.
ERROR: The ID value "_60170" occurs twice in the same BY group.
ERROR: The ID value "_60171" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PersonID=PS000000RW FiscalYear=2014

 

Is there a size limit on how large a file SAS can transpose?

 

I've attached a printout of the sample output.

 

Thanks for your help

6 REPLIES 6
Reeza
Super User

Is there a size limit on how large a file SAS can transpose?

 

Read the error. 

 

ERROR: The ID value "_60101" occurs twice in the same BY group.

 

Check your data to ensure that you do not have duplicates. Otherwise, if you have duplicates by, person, year and committee what would you expect to see if there are multiple records, like this:

 

Obs CompanyID Year RYear ID        PersonID          ID     Value       data_bd

1 0C0000AL9W 2015 2014 AUDIT PS000000RW 60101 2015-04-27 60101
2 0C0000AL9W 2015 2014 AUDIT PS000000RW 60101 DEF 14A 60101

 

This is what SAS is seeing and saying, it cannot create two columns called _60101 within the group Person, FiscalYear, Committee. 

If you need to pre-summarize the data you need to explain how. If not, you need to explain how to deal with the above case. One method is to add another variable that can act as a counter and include both those variables in your ID statement. So that your ID becomes data_BD + counter so it would be _60101_1, _60101_2. 

 

You may also want to add the PREFIX option since those numbers are not valid SAS names and maybe you want some letters in front to make it easier to reference them in the future. 

 

PS your variable names in the code don't match with your sample data so it's hard to follow. 

 


@jbighitter wrote:

I need to transpose a very large dataset by three variables.  Here's the code I'm using

 

data want; set mstar.bd_membership; if Companyid="0C000006SV" or CompanyID="0C000006SP";

proc print; run;

proc sort;
by personid fiscalyear committeeid;
run;

proc transpose data=want out=need;
by personid fiscalyear committeeid;
var value;
id data_bd;
run;

Here's a snapshot of the data

 

Report Fiscal Committee Data

                            Fiscal
Obs CompanyID Year RYear ID        PersonID          ID     Value       data_bd

1 0C0000AL9W 2015 2014 AUDIT PS000000RW 60101 2015-04-27 60101
2 0C0000AL9W 2015 2014 AUDIT PS000000RW 60102 DEF 14A 60102
3 0C0000AL9W 2015 2014 AUDIT PS000000RW 60117 USD 60117
4 0C0000AL9W 2015 2014 AUDIT PS000000RW 60167 0 60167
5 0C0000AL9W 2015 2014 AUDIT PS000000RW 60170 6000.00 60170
6 0C0000AL9W 2015 2014 AUDIT PS000000RW 60171 21000.00 60171
7 0C0000AL9W 2015 2014 BOARD PS000000RW 60101 2015-04-27 60101
8 0C0000AL9W 2015 2014 BOARD PS000000RW 60102 DEF 14A 60102
9 0C0000AL9W 2015 2014 BOARD PS000000RW 60117 USD 60117
10 0C0000AL9W 2015 2014 BOARD PS000000RW 60167 0 60167
11 0C0000AL9W 2015 2014 BOARD PS000000RW 60170 30000.00 60170
12 0C0000AL9W 2015 2014 BOARD PS000000RW 60171 40000.00 60171
13 0C0000AL9W 2015 2014 BOARD PS000000RW 60176 4 60176
14 0C0000AL9W 2015 2014 COMPE PS000000RW 60101 2015-04-27 60101
15 0C0000AL9W 2015 2014 COMPE PS000000RW 60102 DEF 14A 60102
16 0C0000AL9W 2015 2014 COMPE PS000000RW 60117 USD 60117
17 0C0000AL9W 2015 2014 COMPE PS000000RW 60167 1 60167
18 0C0000AL9W 2015 2014 COMPE PS000000RW 60170 4000.00 60170
19 0C0000AL9W 2015 2014 COMPE PS000000RW 60171 14000.00 60171
20 0C0000AL9W 2016 2015 AUDIT PS000000RW 60101 2016-04-19 60101
21 0C0000AL9W 2016 2015 AUDIT PS000000RW 60102 DEF 14A 60102
22 0C0000AL9W 2016 2015 AUDIT PS000000RW 60117 USD 60117
23 0C0000AL9W 2016 2015 AUDIT PS000000RW 60167 0 60167
24 0C0000AL9W 2016 2015 AUDIT PS000000RW 60170 6000.00 60170
25 0C0000AL9W 2016 2015 AUDIT PS000000RW 60171 15000.00 60171

 

When I run this for two companies it works fine.  However, when I run it for 9000+ companies I get an error message in the log that reads 

 

ERROR: The ID value "_60101" occurs twice in the same BY group.
ERROR: The ID value "_60102" occurs twice in the same BY group.
ERROR: The ID value "_60117" occurs twice in the same BY group.
ERROR: The ID value "_60167" occurs twice in the same BY group.
ERROR: The ID value "_60170" occurs twice in the same BY group.
ERROR: The ID value "_60171" occurs twice in the same BY group.
ERROR: The ID value "_60101" occurs twice in the same BY group.
ERROR: The ID value "_60102" occurs twice in the same BY group.
ERROR: The ID value "_60117" occurs twice in the same BY group.
ERROR: The ID value "_60167" occurs twice in the same BY group.
ERROR: The ID value "_60170" occurs twice in the same BY group.
ERROR: The ID value "_60171" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PersonID=PS000000RW FiscalYear=2014

 

Is there a size limit on how large a file SAS can transpose?

 

I've attached a printout of the sample output.

 

Thanks for your help


 

jbighitter
Fluorite | Level 6

Thanks for responding.  It was a data problem.  I had to re-import the data fields as they had truncated at 5 characters causing instances with "BOARD" and "BOARD COMPENSATION COMMITTEE" to both be classified as "BOARD" creating the duplication problem.

 

I do have another issue with the data if you have a moment.

 

I'm using PROC IMPORT to bring in 15000+ .dat files.  I'm sure there is an easier way, but brute force often works for me.

The problem is the field length for VAR6 is set to $808, When I print and/or combine the files I get the WARNING

 


WARNING: Data too long for column "Value"; truncated to 145 characters to fit.
NOTE: There were 4785 observations read from the data set WORK.TEST787.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.06 seconds
cpu time 0.06 seconds

 

Is there a way to modify PROC IMPORT so that the data is not truncated.

 

I've attached a copy of the log file and an example of the data I'm importing.

 

Thanks.  

 

 

 

 

Tom
Super User Tom
Super User

If you want to print long text then use PROC REPORT, it is smarter than PROC PRINT and can wrap long strings for you. 

 

You can use PROC PRINT, but NOT to the listing output destination. Instead send it to another ODS destination, like PDF, RTF, HTML or EXCEL.

Reeza
Super User

Set the GUESSINGROWS=MAX option.

 


@jbighitter wrote:

Thanks for responding.  It was a data problem.  I had to re-import the data fields as they had truncated at 5 characters causing instances with "BOARD" and "BOARD COMPENSATION COMMITTEE" to both be classified as "BOARD" creating the duplication problem.

 

I do have another issue with the data if you have a moment.

 

I'm using PROC IMPORT to bring in 15000+ .dat files.  I'm sure there is an easier way, but brute force often works for me.

The problem is the field length for VAR6 is set to $808, When I print and/or combine the files I get the WARNING

 


WARNING: Data too long for column "Value"; truncated to 145 characters to fit.
NOTE: There were 4785 observations read from the data set WORK.TEST787.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.06 seconds
cpu time 0.06 seconds

 

Is there a way to modify PROC IMPORT so that the data is not truncated.

 

I've attached a copy of the log file and an example of the data I'm importing.

 

Thanks.  

 

 

 

 


 

Tom
Super User Tom
Super User

Do all of your DAT files have the same structure?  If so then don't use PROC IMPORT.  Just write your own DATA step to read the file(s).

You could even read them all in at once.  Although 15,000 seems like a lot of files.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1588 views
  • 0 likes
  • 3 in conversation