Hi all
I performed the following:
214 proc transpose data=rank_score out=rank_score_tr
215 (drop=_name_ _suffix_ _label_);
216 by url;
217 id _term_;
218 var N;
219 run;
ERROR: The ID value "_a" occurs twice in the same BY group.
ERROR: The ID value "_s" occurs twice in the same BY group.
ERROR: The ID value "u_" occurs twice in the same BY group.
ERROR: The ID value "y_" occurs twice in the same BY group.
Getting the error related to the above "supposed" duplicates
I checked, no duplicate records for any of the above IDs
Also, performing the above in Viya does not provide error. Running it in sas 9.4 batch environment trigger the error.
Any idea about what it could be related?
Looks like some of the values of the variable you listed in the ID statement are NOT valid SAS names. So PROC TRANSPOSE had to convert them into something that would be a valid SAS name.
Also consider the case where you have lowercase and uppercase versions of the same string. That will look like two different values to PROC SORT or a simple BY processing, but would both be the same variable name since SAS variable names are NOT case sensitive.
You could try running it with the option VALIDVARNAME set to ANY, but then you will end up with a dataset with ridiculous variable names.
What is the purpose of the PROC TRANSPOSE step anyway?
If you are just trying to make a tabular report then use PROC REPORT or PROC TABULATE.
proc report data=rank_score ;
column url 'N'n,_term_;
define url / group;
define _term_ / across ' ';
define 'N'n / ' ';
run;
View of sample data already sorted before the proc transpose
Looks like some of the values of the variable you listed in the ID statement are NOT valid SAS names. So PROC TRANSPOSE had to convert them into something that would be a valid SAS name.
Also consider the case where you have lowercase and uppercase versions of the same string. That will look like two different values to PROC SORT or a simple BY processing, but would both be the same variable name since SAS variable names are NOT case sensitive.
You could try running it with the option VALIDVARNAME set to ANY, but then you will end up with a dataset with ridiculous variable names.
What is the purpose of the PROC TRANSPOSE step anyway?
If you are just trying to make a tabular report then use PROC REPORT or PROC TABULATE.
proc report data=rank_score ;
column url 'N'n,_term_;
define url / group;
define _term_ / across ' ';
define 'N'n / ' ';
run;
The transpose is an intermediate step needed to have all the unique column "ID" values transposed horizontally, which later will be stored in a macro var, which will be called iteratively later in the process.
I checked, and no, example: "_a" and "_A" values were stored in the ID column
And if I'm not wrong, the fact that a variable name is starting with an underscore is accepted by SAS. So not sure what is causing this in sas 9.4, while in Viya no problem rise for the same procedure
@dcortell wrote:
The transpose is an intermediate step needed to have all the unique column "ID" values transposed horizontally, which later will be stored in a macro var, which will be called iteratively later in the process.
I checked, and no, example: "_a" and "_A" values were stored in the ID column
And if I'm not wrong, the fact that a variable name is starting with an underscore is accepted by SAS. So not sure what is causing this in sas 9.4, while in Viya no problem rise for the same procedure
It is not the underscores. It is the DUPLICATES. You might have created duplicates you didn't think you had by giving it two distinct values that map to the same actual variable name. Here are some examples.
data have;
input id name $char40. ;
dummy=1;
cards;
1 _a
1 _A
2 /B
2 _B
3 /c
3 *C
4 VAR45678901234567890123456789012aaa
4 VAR45678901234567890123456789012xxx
;
180 proc transpose data=have out=want; 181 by id; 182 id name; 183 var dummy; 184 run; ERROR: The ID value "_B" occurs twice in the same BY group. NOTE: The above message was for the following BY group: id=2 ERROR: The ID value "VAR45678901234567890123456789012" occurs twice in the same BY group. NOTE: The above message was for the following BY group: id=4 WARNING: 2 BY groups omitted due to earlier errors. WARNING: Variable _A already exists on file WORK.WANT. WARNING: Variable _C already exists on file WORK.WANT. NOTE: There were 8 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 2 observations and 6 variables. NOTE: PROCEDURE TRANSPOSE used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
Notice that you get two different types of notes in the log depending on whether you have exact name matches or just situations where the case of the names are different. But really both are a problem.
Why would you need to TRANSPOSE the data to get a list of the VALUES into macro varaibles?
proc sql noprint;
select distinct nliteral(upcase(name)) into :mvar separated by '|' from have;
quit;
I'm getting confused. Why the system should consider duplicates situations like "
VAR45678901234567890123456789012aaa
and
VAR45678901234567890123456789012xxx
if they are two different IDs, if only in the last three characters?
The same for "/B" or "_B" ?
A valid SAS name can be a maximum of 32 bytes long.
It can only contain letters, digits and underscores.
It cannot start with a digit.
In both cases two different values of the name string are mapped to the same SAS variable name.
In the case of the long name strings the first 32 bytes of the two different strings are the same.
In the case of the name strings with the invalid characters in them when the invalid characters are replaced with underscores to create valid names they end up with the same name.
Terrific, all clear now. In debt Tom
Ok I investigated a little more and I still need to transpose the columns cause the transposed "bigrams" will be needed as column for modelling purposes. Each column will represent a variable to be included as input variable in a model which need the vars in column. Any chance to still be able to transpose the bigrams, also we with initial characters like " ' " ?
@dcortell wrote:
Ok I investigated a little more and I still need to transpose the columns cause the transposed "bigrams" will be needed as column for modelling purposes. Each column will represent a variable to be included as input variable in a model which need the vars in column. Any chance to still be able to transpose the bigrams, also we with initial characters like " ' " ?
Why do you need the data in that form to do the analysis? What is the analysis you are doing?
Why do you need to use bigram values as the variable names for the analysis? Just use generic names for the variables.
For example give each bigram a distinct number and distinct valid SAS variable name.
proc sort data=have;
by bigram ;
run;
data make_names;
set have;
by bigram ;
bigram_num + first.bigram;
length name $32 ;
name = cats('bigram',bigram_num);
run;
If you want you could use the IDLABEL statement in PROC TRANSPOSE to use the BIGRAM value as the LABEL of the generated variables instead of trying to use it as the NAME of the variable.
Just adding the validvarname=any option solved the topic in sas 9.4. I think the error was related to the values like " 'a" or " 'b", despite the log was pointing to the "_a" or "_b" values of the ID variable.Thanks.
@dcortell wrote:
Just adding the validvarname=any option solved the topic in sas 9.4. I think the error was related to the values like " 'a" or " 'b", despite the log was pointing to the "_a" or "_b" values of the ID variable.Thanks.
So the reason it worked in VIYA is someone had accidentally set VALIDVARNAME to ANY in your VIYA session.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.