BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SASdevAnneMarie
Rhodochrosite | Level 12

Hello Experts,

 

I have 2 variables : "Parc mondial - Taux de logements vacants* (en %)"n  and "Parc mondial - Taux de logements individuels (en %)"n, but while importing I have only the variable "Parc mondial - Taux de logements"n.

 

How I can import the 2 variables (over 32 characters, csv file) ? The problem is that the variables beginning with the same characters. Maybe there is some option in sas. 

 

Thank you for your help.

1 ACCEPTED SOLUTION
11 REPLIES 11
SASdevAnneMarie
Rhodochrosite | Level 12
Thank you, Tom!
Tom
Super User Tom
Super User

Since the data is in a CSV file there is no need to "IMPORT" it.  Instead you can just READ the file directly into a dataset.  Then you can use any NAME you want for the data in the columns with those headers.  You can put the long strings in the header line into the LABEL of the variable.

data want;
  infile 'have.csv' dsd firstobs=2 truncover ;
  length  vacants individuels 8;
  label vacants = 'Parc mondial - Taux de logements vacants* (en %)'
        individuels = 'Parc mondial - Taux de logements individuels (en %)'
  ;
  input vacants individuels;
run;

If you need a tool to try and make GUESSes about how to read in a CSV file instead of using PROC IMPORT try using this macro: https://github.com/sasutils/macros/blob/master/csv2ds.sas

It can handle creating unique names for columns whose headers are the same for the first 32 bytes.  Plus it will always attach a label to the variable when the column header is not exactly the same as the name of the variable it creates.

 

Let's make an example and test them.

filename csv temp;
options parmcards=csv;
parmcards4;
id,Parc mondial - Taux de logements vacants* (en %),Parc mondial - Taux de logements individuels (en %)
1,10,20
2,20,30
3,25,45
;;;;

options validvarname=any;
proc import dbms=csv file=csv out=import replace;
run;
%csv2ds(csv,out=csv2ds,replace=yes)
proc compare data=import compare=csv2ds listall;
run;

When you run with VALIDVARNAME=ANY you will still get those non-standard variable names that will require using name literals to reference, but will will get a numeric suffix on the second one so that the names are different.

Listing of Variables in WORK.CSV2DS but not in WORK.IMPORT                                                                          
                                                                                                                                    
Variable                          Type  Length  Label                                                                               
                                                                                                                                    
Parc mondial - Taux de logemen_1  Num        8  Parc mondial - Taux de logements individuels (en %)   

And if you run with VALIDVARNAME=V7 (which I encourage so that SAS does not generate non-standard variable names) then %CSV2DS will actually be able to make more descriptive long names since it will collapse the ' - ' into a single underscore so that the names will differ in the 32nd byte.

Listing of Variables in WORK.IMPORT but not in WORK.CSV2DS                                                                          
                                                                                                                                    
Variable                          Type  Length  Format   Informat                                                                   
                                                                                                                                    
Parc_mondial___Taux_de_logements  Num        8  BEST12.  BEST32.                                                                    
VAR3                              Num        8  BEST12.  BEST32.                                                                    
                                                                                                                                    
                                                                                                                                    
Listing of Variables in WORK.CSV2DS but not in WORK.IMPORT                                                                          
                                                                                                                                    
Variable                          Type  Length  Label                                                                               
                                                                                                                                    
Parc_mondial_Taux_de_logements_v  Num        8  Parc mondial - Taux de logements vacants* (en %)                                    
Parc_mondial_Taux_de_logements_i  Num        8  Parc mondial - Taux de logements individuels (en %)    

You can see more of the advantages of using %CSV2DS() by reading the comments in the code.

Differences from PROC IMPORT
- Supports header lines with more than 32,767 characters
- Supports ZIP and GZIP source files
- Generates unique variable names by adding numeric suffix
- Does not overestimate maxlength when longest value is quoted
- Does NOT force character type if all values are quoted
- Generates label when generated variable name is different than header
- Supports NAMEROW option
- Supports numeric fields with special missing values (MISSING statement)
- Does not attach unneeded informats or formats
- Allows overriding calculated metadata
- Allow using random sample of rows to guess metadata
- Generates more compact SAS code
- Generates analysis summary dataset and raw data view
- Saves generated SAS code to a file
- Forces DATE and DATETIME formats to show century
- Difference in generated V7 compatible variable names
  - Replaces adjacent non-valid characters with single underscore

 

Tom
Super User Tom
Super User

Note looking at my SAS log I noticed that PROC IMPORT has a BUG that this example demonstrates.

 

When it is run using VALIDVARNAME=V7 is properly detects that the two variables are trying to use the same name

 Name Parc mondial - Taux de logements vacants* (en %) truncated to Parc_mondial___Taux_de_logements.
 Name Parc mondial - Taux de logements individuels (en %) truncated to Parc_mondial___Taux_de_logements.
 Name Parc_mondial___Taux_de_logements is a duplicate.

and automatically changes the second one to VAR3 instead.

 96                informat id best32. ;
 97                informat Parc_mondial___Taux_de_logements best32. ;
 98                informat VAR3 best32. ;
 99                format id best12. ;
 100               format Parc_mondial___Taux_de_logements best12. ;
 101               format VAR3 best12. ;
 102            input
 103                        id
 104                        Parc_mondial___Taux_de_logements
 105                        VAR3
 106            ;

 

But when run using VALIDVARNAME=ANY it does not notice.  So it uses the same name twice in the INFORMAT, FORMAT and INPUT statement.

 

 96                informat id best32. ;
 97                informat "Parc mondial - Taux de logements"N best32. ;
 98                informat "Parc mondial - Taux de logements"N best32. ;
 99                format id best12. ;
 100               format "Parc mondial - Taux de logements"N best12. ;
 101               format "Parc mondial - Taux de logements"N best12. ;
 102            input
 103                        id
 104                        "Parc mondial - Taux de logements"N
 105                        "Parc mondial - Taux de logements"N
 106            ;

resulting in a dataset with 2 instead of 3 variables.

NOTE: The data set WORK.IMPORT has 3 observations and 2 variables.

 

 

 

Kathryn_SAS
SAS Employee

@Tom I believe SAS does notice the problem with PROC IMPORT and VALIDVARNAME=ANY. In my log I see the following line:

11   proc import dbms=csv file=csv out=import replace;
12   run;

Name Parc mondial - Taux de logements vacants* (en %) truncated to Parc mondial - Taux de logements.
Name Parc mondial - Taux de logements individuels (en %) truncated to Parc mondial - Taux de
logements.
Problems were detected with provided names.  See LOG.

The DATA Step code that is run behind the scenes will generate the Informat, Input, and Format statements for each variable; but since there are duplicates, it is only keeping the last one. That is why the values that are stored for Parc mondial - Taux de logements are 20, 30, 45.

Tom
Super User Tom
Super User

It might notice it, but unlike when using VALIDVARNAME=V7 it takes no action to FIX it.

 

Note that if you do the same test with duplicate column headers that are shorter than 32 bytes then PROC IMPORT when run with VALIDVARNAME=ANY will detect and FIX the duplicate names.

options parmcards=csv;
filename csv temp;
parmcards4;
id,a,a,b,b
1,2,3,4,5
;;;;

options validvarname=v7;
proc import dbms=csv file=csv out=v7 replace;
run;
options validvarname=any;
proc import dbms=csv file=csv out=any replace;
run;

proc compare data=any compare=v7 listall;
run;
Ksharp
Super User

The max length of variable in SAS is 32, so you are unable to use ">32 string" to make a variable name. But you can make it as variable label:

Firstly import this csv file with option getname= and datarow= 

proc import datafile='c:\temp\temp.csv' out=have dbms=csv replace;
getnames=no;
datarow=2;
run;

Secondly, label these variables with the first row:

options obs=1;
proc import datafile='c:\temp\temp.csv' out=label dbms=csv replace;
getnames=no;
run;
option obs=max;

proc transpose data=label out=label2;
var _all_;
run;
data _null_;
set label2 end=last;
if _n_=1 then call execute('proc datasets library=work nolist nodetails;modify have;label ');
call execute(catt(_name_,'="',col1,'"'));
if last then call execute(';quit;');
run;

WarrenKuhfeld
Ammonite | Level 13

@Ksharp In a silly but technical sense, what you are saying is not correct. Before I retired from SAS, I used to run tests with n-literals like these to both expose bugs and to demonstrate my contention that n-literals like these should be prohibited. They were not, and as silly as it is, this is perfectly legitimate SAS code.

options validvarname=any;
data sillynlit;
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""         "n = 'silly double quote n-lit';
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''         'n = 'silly single quote n-lit';
output;
run;
proc print; run;

 

Ksharp
Super User
Thank you . This is really a bug.
Tom
Super User Tom
Super User

@WarrenKuhfeld wrote:

@Ksharp In a silly but technical sense, what you are saying is not correct. Before I retired from SAS, I used to run tests with n-literals like these to both expose bugs and to demonstrate my contention that n-literals like these should be prohibited. They were not, and as silly as it is, this is perfectly legitimate SAS code.

options validvarname=any;
data sillynlit;
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""         "n = 'silly double quote n-lit';
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''         'n = 'silly single quote n-lit';
output;
run;
proc print; run;

 


No.  Technically what @Ksharp said is TRUE. The maximum length of a variable name is 32 bytes.

 

What you are talking about is the maximum length of the TOKEN used in the SAS code to represent the variable name in the code.  The maximum length of that token would be one that was all the same quote character.  So 32*2 quotes for the actual name and then 2 more quotes and the letter N to make a name literal for a total length of 67 bytes.

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 628 views
  • 8 likes
  • 6 in conversation