Hello Experts,
I have 2 variables : "Parc mondial - Taux de logements vacants* (en %)"n and "Parc mondial - Taux de logements individuels (en %)"n, but while importing I have only the variable "Parc mondial - Taux de logements"n.
How I can import the 2 variables (over 32 characters, csv file) ? The problem is that the variables beginning with the same characters. Maybe there is some option in sas.
Thank you for your help.
PS 32 bytes is a hard, physical limit. It is the length reserved for names in the header structure of .sas7bdat files and anywhere else in the SAS language.
Use proper names in the source. What you show is suitable for labels, but never for names, in ANY programming environment.
PS 32 bytes is a hard, physical limit. It is the length reserved for names in the header structure of .sas7bdat files and anywhere else in the SAS language.
Since the data is in a CSV file there is no need to "IMPORT" it. Instead you can just READ the file directly into a dataset. Then you can use any NAME you want for the data in the columns with those headers. You can put the long strings in the header line into the LABEL of the variable.
data want;
infile 'have.csv' dsd firstobs=2 truncover ;
length vacants individuels 8;
label vacants = 'Parc mondial - Taux de logements vacants* (en %)'
individuels = 'Parc mondial - Taux de logements individuels (en %)'
;
input vacants individuels;
run;
If you need a tool to try and make GUESSes about how to read in a CSV file instead of using PROC IMPORT try using this macro: https://github.com/sasutils/macros/blob/master/csv2ds.sas
It can handle creating unique names for columns whose headers are the same for the first 32 bytes. Plus it will always attach a label to the variable when the column header is not exactly the same as the name of the variable it creates.
Let's make an example and test them.
filename csv temp;
options parmcards=csv;
parmcards4;
id,Parc mondial - Taux de logements vacants* (en %),Parc mondial - Taux de logements individuels (en %)
1,10,20
2,20,30
3,25,45
;;;;
options validvarname=any;
proc import dbms=csv file=csv out=import replace;
run;
%csv2ds(csv,out=csv2ds,replace=yes)
proc compare data=import compare=csv2ds listall;
run;
When you run with VALIDVARNAME=ANY you will still get those non-standard variable names that will require using name literals to reference, but will will get a numeric suffix on the second one so that the names are different.
Listing of Variables in WORK.CSV2DS but not in WORK.IMPORT
Variable Type Length Label
Parc mondial - Taux de logemen_1 Num 8 Parc mondial - Taux de logements individuels (en %)
And if you run with VALIDVARNAME=V7 (which I encourage so that SAS does not generate non-standard variable names) then %CSV2DS will actually be able to make more descriptive long names since it will collapse the ' - ' into a single underscore so that the names will differ in the 32nd byte.
Listing of Variables in WORK.IMPORT but not in WORK.CSV2DS
Variable Type Length Format Informat
Parc_mondial___Taux_de_logements Num 8 BEST12. BEST32.
VAR3 Num 8 BEST12. BEST32.
Listing of Variables in WORK.CSV2DS but not in WORK.IMPORT
Variable Type Length Label
Parc_mondial_Taux_de_logements_v Num 8 Parc mondial - Taux de logements vacants* (en %)
Parc_mondial_Taux_de_logements_i Num 8 Parc mondial - Taux de logements individuels (en %)
You can see more of the advantages of using %CSV2DS() by reading the comments in the code.
Differences from PROC IMPORT - Supports header lines with more than 32,767 characters - Supports ZIP and GZIP source files - Generates unique variable names by adding numeric suffix - Does not overestimate maxlength when longest value is quoted - Does NOT force character type if all values are quoted - Generates label when generated variable name is different than header - Supports NAMEROW option - Supports numeric fields with special missing values (MISSING statement) - Does not attach unneeded informats or formats - Allows overriding calculated metadata - Allow using random sample of rows to guess metadata - Generates more compact SAS code - Generates analysis summary dataset and raw data view - Saves generated SAS code to a file - Forces DATE and DATETIME formats to show century - Difference in generated V7 compatible variable names - Replaces adjacent non-valid characters with single underscore
Note looking at my SAS log I noticed that PROC IMPORT has a BUG that this example demonstrates.
When it is run using VALIDVARNAME=V7 is properly detects that the two variables are trying to use the same name
Name Parc mondial - Taux de logements vacants* (en %) truncated to Parc_mondial___Taux_de_logements. Name Parc mondial - Taux de logements individuels (en %) truncated to Parc_mondial___Taux_de_logements. Name Parc_mondial___Taux_de_logements is a duplicate.
and automatically changes the second one to VAR3 instead.
96 informat id best32. ; 97 informat Parc_mondial___Taux_de_logements best32. ; 98 informat VAR3 best32. ; 99 format id best12. ; 100 format Parc_mondial___Taux_de_logements best12. ; 101 format VAR3 best12. ; 102 input 103 id 104 Parc_mondial___Taux_de_logements 105 VAR3 106 ;
But when run using VALIDVARNAME=ANY it does not notice. So it uses the same name twice in the INFORMAT, FORMAT and INPUT statement.
96 informat id best32. ; 97 informat "Parc mondial - Taux de logements"N best32. ; 98 informat "Parc mondial - Taux de logements"N best32. ; 99 format id best12. ; 100 format "Parc mondial - Taux de logements"N best12. ; 101 format "Parc mondial - Taux de logements"N best12. ; 102 input 103 id 104 "Parc mondial - Taux de logements"N 105 "Parc mondial - Taux de logements"N 106 ;
resulting in a dataset with 2 instead of 3 variables.
NOTE: The data set WORK.IMPORT has 3 observations and 2 variables.
@Tom I believe SAS does notice the problem with PROC IMPORT and VALIDVARNAME=ANY. In my log I see the following line:
11 proc import dbms=csv file=csv out=import replace;
12 run;
Name Parc mondial - Taux de logements vacants* (en %) truncated to Parc mondial - Taux de logements.
Name Parc mondial - Taux de logements individuels (en %) truncated to Parc mondial - Taux de
logements.
Problems were detected with provided names. See LOG.
The DATA Step code that is run behind the scenes will generate the Informat, Input, and Format statements for each variable; but since there are duplicates, it is only keeping the last one. That is why the values that are stored for Parc mondial - Taux de logements are 20, 30, 45.
It might notice it, but unlike when using VALIDVARNAME=V7 it takes no action to FIX it.
Note that if you do the same test with duplicate column headers that are shorter than 32 bytes then PROC IMPORT when run with VALIDVARNAME=ANY will detect and FIX the duplicate names.
options parmcards=csv;
filename csv temp;
parmcards4;
id,a,a,b,b
1,2,3,4,5
;;;;
options validvarname=v7;
proc import dbms=csv file=csv out=v7 replace;
run;
options validvarname=any;
proc import dbms=csv file=csv out=any replace;
run;
proc compare data=any compare=v7 listall;
run;
The max length of variable in SAS is 32, so you are unable to use ">32 string" to make a variable name. But you can make it as variable label:
Firstly import this csv file with option getname= and datarow=
proc import datafile='c:\temp\temp.csv' out=have dbms=csv replace; getnames=no; datarow=2; run;
Secondly, label these variables with the first row:
options obs=1;
proc import datafile='c:\temp\temp.csv' out=label dbms=csv replace;
getnames=no;
run;
option obs=max;
proc transpose data=label out=label2;
var _all_;
run;
data _null_;
set label2 end=last;
if _n_=1 then call execute('proc datasets library=work nolist nodetails;modify have;label ');
call execute(catt(_name_,'="',col1,'"'));
if last then call execute(';quit;');
run;
@Ksharp In a silly but technical sense, what you are saying is not correct. Before I retired from SAS, I used to run tests with n-literals like these to both expose bugs and to demonstrate my contention that n-literals like these should be prohibited. They were not, and as silly as it is, this is perfectly legitimate SAS code.
options validvarname=any;
data sillynlit;
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" "n = 'silly double quote n-lit';
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 'n = 'silly single quote n-lit';
output;
run;
proc print; run;
@WarrenKuhfeld wrote:
@Ksharp In a silly but technical sense, what you are saying is not correct. Before I retired from SAS, I used to run tests with n-literals like these to both expose bugs and to demonstrate my contention that n-literals like these should be prohibited. They were not, and as silly as it is, this is perfectly legitimate SAS code.
options validvarname=any; data sillynlit; """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" "n = 'silly double quote n-lit'; ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 'n = 'silly single quote n-lit'; output; run; proc print; run;
No. Technically what @Ksharp said is TRUE. The maximum length of a variable name is 32 bytes.
What you are talking about is the maximum length of the TOKEN used in the SAS code to represent the variable name in the code. The maximum length of that token would be one that was all the same quote character. So 32*2 quotes for the actual name and then 2 more quotes and the letter N to make a name literal for a total length of 67 bytes.
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.