BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
akimme
Obsidian | Level 7

So I downloaded my data set as a .CSV from my survey software, plugged it into SAS, and made myself a temporary data set to work with. So far so good. Then I found out when I tried to run basic demographics that I'm not sure how to refer to my variables. 

 

My survey software exported them with their names, spaces and all, and ValidVarName hasn't helped. Even when I try entering the abbreviated variable names as a quoted string, SAS is unable to find them. I mostly get these errors: 

ERROR 22-322: Syntax error, expecting one of the following: a name, (, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_.  
 ERROR 200-322: The symbol is not recognized and will be ignored.
 ERROR: Variable NAME not found.

Proc Contents gives me a number for each variable. Is there some way I can refer to them with just their location (50, 51…) in the data set without having to go through and rename everything? I'm on SODA on a Mac if that helps.

Screen Shot 2023-05-18 at 2.37.23 PM.png

 

Thank you!

 *things I have tried;
OPTIONS VALIDVARNAME = any;
proc freq data=AllGC; tables 'What is your race? (choice=Blac' / out=FreqCount outexpect sparse; title 'Racial Demographics'; run; proc freq data=AllGC; tables What is your race? (choice=Blac / out=FreqCount outexpect sparse; title 'Racial Demographics'; run;
 proc freq data=AllGC;
   tables 50
   / out=FreqCount outexpect sparse;
   title 'Racial Demographics';
run;

proc freq data=AllGC; tables var50 / out=FreqCount outexpect sparse; title 'Racial Demographics'; run;

proc freq data=AllGC;
tables 'var50' / out=FreqCount outexpect sparse;
title 'Racial Demographics';
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@akimme wrote:
This survey is closed, but that is a good point about fixing typos and such changing variable names. I'll keep that in mind, thanks.

What code would you recommend for renaming 70+ variables efficiently?

For names like that where there is no easy algorithm to create a rational variable name  I would use COPY and PASTE.

Dump the existing labels to a file and then edit in the new names you want to use to make up RENAME and LABEL statements.

data goodnames ;
  set badnames(rename=(
 'Long description'n = name1
 'Other long descirption'n = name2 
  ));
label
 name1 = 'Long description'
 name2 = 'Other long descirption'
;
run;

 

If you just want to use names like VAR1, VAR2 etc to start with you could tell PROC IMPORT to NOT try to make up the names. 

 

Let's make a demonstration be making a CSV file from SASHELP.CLASS and then reading it in and generating labels from the column headers.

filename csv temp;
proc export data=sashelp.class dbms=csv file=csv; run;

proc import file=csv dbms=csv out=want replace;
  guessingrows=max;
  getnames=NO;
  datarow=2;
run;

proc import file=csv dbms=csv out=headers replace;
  getnames=NO;
run;

proc transpose data=headers(obs=1) out=names(rename=(col1=label));
  var _all_;
run;

filename code temp;
data _null_;
  set names end=eof;
  file code;
  if _n_=1 then put 'label';
  put _name_ '=' label :$quote. ;
  if eof then put ';' ;
run;

proc datasets nolist lib=work;
  modify want;
%include code / source2;
  run;
quit;
1060  proc datasets nolist lib=work;
1061    modify want;
1062  %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file C:\Users\...\#LN00067.
1063 +label
1064 +VAR1 ="Name"
1065 +VAR2 ="Sex"
1066 +VAR3 ="Age"
1067 +VAR4 ="Height"
1068 +VAR5 ="Weight"
1069 +;
NOTE: %INCLUDE (level 1) ending.
1070    run;

NOTE: MODIFY was successful for WORK.WANT.DATA.
1071  quit;

 

 

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

If you set the VALIDVARNAME option to V7 before you call PROC IMPORT then it will not put spaces into the variable names.

 

If you write your own data step to read the CSV file you can use more natural variable names and leave those descriptive strings in the first column of your photograph as the variable LABEL where they can be up to 256 characters long and have any character they want.

data want;
  attrib date length=8 format=yymmdd10. informat=yymmdd10. 
    label='Approximately what date did you do xxx' 
  ;
....

 

As a last resort  you will have to use NAME LITERALS in your code to reference the variables that have been created with non valid names.  Just suffix the quoted string with the letter N.

proc freq data=AllGC;
   tables 'What is your race? (choice=Blac'n
   / out=FreqCount outexpect sparse;
   title 'Racial Demographics';
run;

Just like you use the letter D when make a Date literal and the letter T when making a Time litera..

akimme
Obsidian | Level 7

Okay, got it - I appreciate the sample code and having multiple options, thank you. I'll have to go through my analysis plan and see which would be more efficient, renaming my variables or the ‘string’n method.

Would validvarname still be set to default for the name literals method?

 

Also, if I e.g. add a label that's 1, 2, etc, is there a way I can tell SAS to swap the label and variable name fields instead of manually inputting everything?

Tom
Super User Tom
Super User

If you have VALIDVARNAME=ANY then PROC IMPORT will not change the header values (other than truncating to 32 bytes).  So you will HAVE to use name literals to reference any of the names that do not follow the normal naming rules.

 

And if you save the dataset then in any future SAS session that you want to read the dataset you will have to make sure VALIDVARNAME is set to ANY.

 

Note you can use a name literal name literals for names that are normal names. SEX and 'SEX'n refer to the same variable name.

 

If you use VALIDVARNAME=V7 when reading the CSV file then PROC IMPORT will replace invalid characters with underscores.  So a header of 'This long name' will be named This_long_name instead of 'This long name'n.

 

If the dataset only has valid name then you can read it in a session that is using either VALIDVARNAME=ANY or VALIDVARNAME=V7 since it does have valid names.

 

You cannot just swap the names.  But it is possible to get the metadata about a variable and use that to generate code that would change a variables name and/or change its label.

 

 

ballardw
Super User

Personally when I get data sources with column headings like that I write my own data step to read the data.

Variable names should be relatively short. Yours look like survey questions. So I might name the variables with something like a question number such as might appear in documentation of the survey or something related to the topic. That stuff like "What is your race ..." would be LABELs for the SAS variables so the question text would appear by default in most procedure output.

 

I'm lazy. I would much rather use a short variable name like "Race" in code instead of typing "What is your race? (choice = Blac"n .

 

Many of the rules behind Proc Import were designed before the advent of extreme verbosity from not-really-ready-for-primetime-data analysis like Survey Monkey and competitors. So take control of your data before it swamps you with excessively long variable names. Also, minor revisions to an on going survey, such as changing the spelling in the question text in this sort of software means that even though the data should be treated exactly the same the next file collected with the updated question text would have different variable names leading to much work to "fix" stuff.

Ask me how I know...

akimme
Obsidian | Level 7

This survey is closed, but that is a good point about fixing typos and such changing variable names. I'll keep that in mind, thanks.

What code would you recommend for renaming 70+ variables efficiently? If I e.g. add a label that's 1, 2, etc, is there a way I can tell SAS to swap the label and variable name fields instead of manually inputting everything?

Tom
Super User Tom
Super User

@akimme wrote:
This survey is closed, but that is a good point about fixing typos and such changing variable names. I'll keep that in mind, thanks.

What code would you recommend for renaming 70+ variables efficiently?

For names like that where there is no easy algorithm to create a rational variable name  I would use COPY and PASTE.

Dump the existing labels to a file and then edit in the new names you want to use to make up RENAME and LABEL statements.

data goodnames ;
  set badnames(rename=(
 'Long description'n = name1
 'Other long descirption'n = name2 
  ));
label
 name1 = 'Long description'
 name2 = 'Other long descirption'
;
run;

 

If you just want to use names like VAR1, VAR2 etc to start with you could tell PROC IMPORT to NOT try to make up the names. 

 

Let's make a demonstration be making a CSV file from SASHELP.CLASS and then reading it in and generating labels from the column headers.

filename csv temp;
proc export data=sashelp.class dbms=csv file=csv; run;

proc import file=csv dbms=csv out=want replace;
  guessingrows=max;
  getnames=NO;
  datarow=2;
run;

proc import file=csv dbms=csv out=headers replace;
  getnames=NO;
run;

proc transpose data=headers(obs=1) out=names(rename=(col1=label));
  var _all_;
run;

filename code temp;
data _null_;
  set names end=eof;
  file code;
  if _n_=1 then put 'label';
  put _name_ '=' label :$quote. ;
  if eof then put ';' ;
run;

proc datasets nolist lib=work;
  modify want;
%include code / source2;
  run;
quit;
1060  proc datasets nolist lib=work;
1061    modify want;
1062  %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file C:\Users\...\#LN00067.
1063 +label
1064 +VAR1 ="Name"
1065 +VAR2 ="Sex"
1066 +VAR3 ="Age"
1067 +VAR4 ="Height"
1068 +VAR5 ="Weight"
1069 +;
NOTE: %INCLUDE (level 1) ending.
1070    run;

NOTE: MODIFY was successful for WORK.WANT.DATA.
1071  quit;

 

 

ballardw
Super User

@akimme wrote:

This survey is closed, but that is a good point about fixing typos and such changing variable names. I'll keep that in mind, thanks.

What code would you recommend for renaming 70+ variables efficiently? If I e.g. add a label that's 1, 2, etc, is there a way I can tell SAS to swap the label and variable name fields instead of manually inputting everything?


From many years working in the survey industry a "closed" survey does not mean that the survey structure will not be used again. In fact almost the opposite. If a survey is run intending to collect information on a population, be that geographical, demographical, activity or purchasing pattern based someone in management will quite often come up with the question "How does this population differ/be similar to group X or changed after we did Y?". So to make direct comparisons the same survey is reused with a different sample. But often minor tweeks, like typos in questions.

 

I had one source of surveys that refused to place punctuation consistently. One question might have a question mark preceded with 2 or more spaces and next would be as expected. So when they "revised" a questionnaire the text of the questions could change just in the placement of question marks. Which if the question was short enough would mean that a Proc Import generated variable name would differ. Adding work when combining the data to do proper comparison analysis.

 

However if a data step were to read the data with the same survey then just updating the source file and the output data set name would suffice.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1981 views
  • 4 likes
  • 3 in conversation