BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
dcortell
Pyrite | Level 9

Hi all

 

I performed the following:

 

214        proc transpose data=rank_score out=rank_score_tr
215        (drop=_name_ _suffix_ _label_);
216        by url;
217        id _term_;
218        var N;
219        run;

ERROR: The ID value "_a" occurs twice in the same BY group.
ERROR: The ID value "_s" occurs twice in the same BY group.
ERROR: The ID value "u_" occurs twice in the same BY group.
ERROR: The ID value "y_" occurs twice in the same BY group.

Getting the error related to the above "supposed" duplicates

 

I checked, no duplicate records for any of the above IDs

 

Also, performing the above in Viya does not provide error. Running it in sas 9.4 batch environment trigger the error.

 

Any idea about what it could be related?

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Looks like some of the values of the variable you listed in the ID statement are NOT valid SAS names.  So PROC TRANSPOSE had to convert them into something that would be a valid SAS name.

 

Also consider the case where you have lowercase and uppercase versions of the same string.  That will look like two different values to PROC SORT or a simple BY processing, but would both be the same variable name since SAS variable names are NOT case sensitive.

 

You could try running it with the option VALIDVARNAME set to ANY, but then you will end up with a dataset with ridiculous variable names.

 

What is the purpose of the PROC TRANSPOSE step anyway? 

If you are just trying to make a tabular report then use PROC REPORT or PROC TABULATE.

proc report data=rank_score ;
  column url 'N'n,_term_;
  define url / group;
  define _term_ / across ' ';
  define 'N'n / ' ';
run;

View solution in original post

12 REPLIES 12
dcortell
Pyrite | Level 9

View of sample data already sorted  before the proc transpose

0acdcb8f-c188-4c55-a28c-f15dd986c43b.png

 

 

Tom
Super User Tom
Super User

Looks like some of the values of the variable you listed in the ID statement are NOT valid SAS names.  So PROC TRANSPOSE had to convert them into something that would be a valid SAS name.

 

Also consider the case where you have lowercase and uppercase versions of the same string.  That will look like two different values to PROC SORT or a simple BY processing, but would both be the same variable name since SAS variable names are NOT case sensitive.

 

You could try running it with the option VALIDVARNAME set to ANY, but then you will end up with a dataset with ridiculous variable names.

 

What is the purpose of the PROC TRANSPOSE step anyway? 

If you are just trying to make a tabular report then use PROC REPORT or PROC TABULATE.

proc report data=rank_score ;
  column url 'N'n,_term_;
  define url / group;
  define _term_ / across ' ';
  define 'N'n / ' ';
run;
dcortell
Pyrite | Level 9

The transpose is an intermediate step needed to have all the unique column "ID" values transposed horizontally, which later will be stored in a macro var, which will be called iteratively later in the process.

 

I checked, and no, example: "_a" and "_A" values were stored in the ID column

 

And if I'm not wrong, the fact that a variable name is starting with an underscore is accepted by SAS. So not sure what is causing this in sas 9.4, while in Viya no problem rise for the same procedure 

Tom
Super User Tom
Super User

@dcortell wrote:

The transpose is an intermediate step needed to have all the unique column "ID" values transposed horizontally, which later will be stored in a macro var, which will be called iteratively later in the process.

 

I checked, and no, example: "_a" and "_A" values were stored in the ID column

 

And if I'm not wrong, the fact that a variable name is starting with an underscore is accepted by SAS. So not sure what is causing this in sas 9.4, while in Viya no problem rise for the same procedure 


It is not the underscores. It is the DUPLICATES.  You might have created duplicates you didn't think you had by giving it two distinct values that map to the same actual variable name.  Here are some examples.

data have;
  input id name $char40. ;
  dummy=1;
cards;
1 _a
1 _A
2 /B
2 _B
3 /c
3 *C
4 VAR45678901234567890123456789012aaa
4 VAR45678901234567890123456789012xxx
;
180  proc transpose data=have out=want;
181    by id;
182    id name;
183    var dummy;
184  run;

ERROR: The ID value "_B" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
      id=2
ERROR: The ID value "VAR45678901234567890123456789012" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
      id=4
WARNING: 2 BY groups omitted due to earlier errors.
WARNING: Variable _A already exists on file WORK.WANT.
WARNING: Variable _C already exists on file WORK.WANT.
NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 2 observations and 6 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Notice that you get two different types of notes in the log depending on whether you have exact name matches or just situations where the case of the names are different.  But really both are a problem.

 

Why would you need to TRANSPOSE the data to get a list of the VALUES into macro varaibles?

proc sql noprint;
  select distinct nliteral(upcase(name)) into :mvar separated by '|' from have;
quit;

 

dcortell
Pyrite | Level 9

I'm getting confused. Why the system should consider duplicates situations like "

VAR45678901234567890123456789012aaa

and

VAR45678901234567890123456789012xxx

if they are two different IDs, if only in the last three characters?

 

The same for "/B" or "_B" ?

Tom
Super User Tom
Super User

A valid SAS name can be a maximum of 32 bytes long.

It can only contain letters, digits and underscores.  

It cannot start with a digit.

 

In both cases two different values of the name string are mapped to the same SAS variable name.

In the case of the long name strings the first 32 bytes of the two different strings are the same.

In the case of the name strings with the invalid characters in them when the invalid characters are replaced with underscores to create valid names they end up with the same name.

dcortell
Pyrite | Level 9

Terrific, all clear now. In debt Tom

dcortell
Pyrite | Level 9

Ok I investigated a little more and I still need to transpose the columns cause the transposed "bigrams" will be needed as column for modelling purposes. Each column will represent a variable to be included as input variable in a model which need the vars in column. Any chance to still be able to transpose the bigrams, also we with initial characters like " ' " ?

Tom
Super User Tom
Super User

@dcortell wrote:

Ok I investigated a little more and I still need to transpose the columns cause the transposed "bigrams" will be needed as column for modelling purposes. Each column will represent a variable to be included as input variable in a model which need the vars in column. Any chance to still be able to transpose the bigrams, also we with initial characters like " ' " ?


Why do you need the data in that form to do the analysis?  What is the analysis you are doing?

 

Why do you need to use bigram values as the variable names for the analysis?  Just use generic names for the variables.

For example give each bigram a distinct number and distinct valid SAS variable name.

proc sort data=have;
  by bigram ;
run;
data make_names;
  set have;
  by bigram ;
  bigram_num + first.bigram;
  length name $32 ;
  name = cats('bigram',bigram_num);
run;

If you want you could use the IDLABEL statement in PROC TRANSPOSE to use the BIGRAM value as the LABEL of the generated variables instead of trying to use it as the NAME of the variable.

dcortell
Pyrite | Level 9

Just adding the validvarname=any option solved the topic in sas 9.4. I think the error was related to the values like " 'a" or " 'b", despite the log was pointing to the "_a" or "_b" values of the ID variable.Thanks.

Tom
Super User Tom
Super User

@dcortell wrote:

Just adding the validvarname=any option solved the topic in sas 9.4. I think the error was related to the values like " 'a" or " 'b", despite the log was pointing to the "_a" or "_b" values of the ID variable.Thanks.


So the reason it worked in VIYA is someone had accidentally set VALIDVARNAME to ANY in your VIYA session.

Ksharp
Super User
Try LET option if you know that is right.

proc transpose data=rank_score out=rank_score_tr LET .............;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 721 views
  • 1 like
  • 3 in conversation