SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Character Variables

Reply
Contributor
Posts: 22

Character Variables

I am merging several spreadsheets in SAS that I obtained from Qualtrics.  For some reason, when read into SAS the same question in one spreadsheet reads in as numeric, but character in another spreadsheet. Is there a way, using proc import, to make sure all the variables from the 7 spreadsheets (downloaded from Qualtrics) are read in as character variables?  It gets annoying having to convert to figure out which variables are being read in as numeric from which spreadsheet and then having to convert it.

 

Thank you!

 

Super User
Posts: 7,422

Re: Character Variables

Short answer: no. Proc import makes guesses about the structure, depending on the values it finds, and therefore has different outcomes.

If you have a text-based file (not native Excel, but csv or similar), you can copy a data step (that was created by proc import) from the log and use that.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 22

Re: Character Variables

What do you copy from the log? I can convert the spreadsheets to csv.
Super User
Posts: 7,422

Re: Character Variables

When you use proc import to read a csv file, it writes the data step (that actually does the import) to the log. Copy that, adapt it (eg for maximum expectable string lengths), and use it on all files to get consistent structure.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 19,100

Re: Character Variables

Make an export from Qualtrics as CSV or SPSS and use that instead of an Excel spreadsheet. In my experience that works a lot better.

Contributor
Posts: 22

Re: Character Variables

So if some of the log says:
52 informat Q27_1_4_RANK best32. ;
53 informat Q27_1_5_RANK best32. ;
54 informat Q27_1_6_RANK best32. ;
55 informat Q27_2_1_RANK best32. ;
56 informat Q27_2_2_RANK $1. ;
57 informat Q27_2_3_RANK $1. ;
58 informat Q27_2_4_RANK best32. ;
59 informat Q27_2_5_RANK best32. ;
60 informat Q27_2_6_RANK best32. ;
61 informat Q21_1 $31. ;
62 informat Q21_2 $53. ;
63 informat Q21_3 $53. ;
64 informat Q21_4 $31. ;
65 informat Q21_5 $31. ;
66 informat Q21_6 $53. ;
67 informat Q21_7 $53. ;
68 informat Q21_8 $34. ;
69 informat Q21_9 $34. ;
70 informat Q21_10 $34. ;
71 informat Q21_11 $53. ;
72 informat Q21_12 $53. ;
73 informat Q21_13 $53. ;
74 informat Q20_1_1 $10. ;
75 informat Q20_1_2 $10. ;
76 informat Q20_1_3 $10. ;
77 informat Q20_1_4 $6. ;
78 informat Q20_1_5 $6. ;
79 informat Q36_1_1 $10. ;
80 informat Q36_1_2 $9. ;
81 informat Q36_1_3 $9. ;
82 informat Q36_1_4 $9. ;
83 informat Q36_1_5 $10. ;
84 informat Q49_1_1 $65. ;
85 informat Q49_1_2 $65. ;
86 informat Q49_1_3 $19. ;
87 informat Q49_1_4 $65. ;
88 informat Q49_1_5 $19. ;
89 informat Q49_1_6 $65. ;
90 informat Q49_1_7 $65. ;
91 informat Q49_1_8 $65. ;
92 informat Q49_1_9 $65. ;
93 informat Q49_1_10 $19. ;
94 informat Q23 $25. ;
95 informat Q23_12_TEXT $1. ;
96 informat Q24 $13. ;
97 informat Q24_12_TEXT $1. ;
98 informat Q59 $87. ;
99 informat Q61 $3. ;
100 informat Q63 $3. ;
101 informat Q48 $85. ;
102 informat Q48_7_TEXT $1. ;
103 informat Q47_6 $3. ;
104 informat Q47_11 $3. ;
105 informat Q47_7 $14. ;
106 informat Q47_8 $14. ;
107 informat Q47_9 $14. ;
108 informat Q47_4 $14. ;
109 informat Q47_14 $14. ;
110 informat Q47_10 $3. ;
111 informat Q60_1 $14. ;
112 informat Q60_2 $14. ;
113 informat Q60_3 $14. ;
114 informat Q60_4 $14. ;
115 informat Q60_5 $14. ;
116 informat Q60_6 $14. ;
117 informat Q60_7 $14. ;
118 informat Q60_8 $14. ;
119 informat Q52 $25. ;
120 informat Q44 $130. ;
121 informat _3 $6. ;
122 informat _4 $5. ;
123 informat _5 $49. ;
124 informat _5_9_TEXT $1. ;
125 informat Q32 $53. ;
126 informat Q26 $58. ;
127 informat Q55 $28. ;
128 informat Q56 $28. ;
129 format Progress best12. ;
130 format Duration__in_seconds_ best12. ;
131 format Finished $4. ;
132 format RecordedDate datetime. ;
133 format RecipientEmail $32. ;
134 format LocationLatitude best12. ;
135 format LocationLongitude best12. ;
136 format DistributionChannel $5. ;
137 format UserLanguage $2. ;


I put that in proc import statement for other spreadsheets? If so, where would I put that in that statement? (Obviously excluding the numbers in the beginning)
Super User
Posts: 19,100

Re: Character Variables

Spreadsheets can only be read in via PROC IMPORT. You cannot easily specify the variable type or format because Excel doesn't force any structure on the data. 

 

If you want to be able to specify the structure, you copy the code from the log, holding down ALT key while you're selecting to remove the row numbers. The full code, not the snippet you've shown, includes the path to the file. 

 

Take the code, change the path to the file and the data set name and use that to import your new data set.

Contributor
Posts: 22

Re: Character Variables

Thank you!! This just made my day...no, week! I appreciate all your help!
Highlighted
Super User
Super User
Posts: 6,844

Re: Character Variables

PROC IMPORT writes really ugly code.

To generate a data step from a CSV file follow these steps.

  • Copy the first row with the variable names from the CSV file and paste it into your program editor.
  • Change all of the commas between the names to spaces.  Reflow the line into multiple lines so that it is readable. I like to use a line length of around 65 but you could go as far as keeping only one name per line.
  • Add a line with the keyword LENGTH before the first variable name and a line with a semi-colon after the last one to end the LENGTH statement.
  • Then for each variable in the list decide if it is a number or a character string. For numbers add 8 after the variable name since SAS stores numbers as 8 byte floating point numbers. Note that date, time and datetime values are all numbers. If it is character variable then pick a max length for the variable and add that number with a $ prefix after the variable name.  So you end up with something like 'length name $20 age 8;'.  When in doubt about a variable then first read it as character and then examine the values after you have them in a SAS dataset and see if there is a better definition for the variable.  You can then modify your program and re-run it.
  • Now for any variable that needs one add an INFORMAT statement. Most variables do NOT need informats. You will need them for DATE and TIME values. You might need them for currency values if they contain $ or commas.
  • Also for any variable that needs one add a FORMAT statement. Most variables do NOT need formats attached. Date and time of the obvious exception.
  • Then add an INPUT statement that lists the variables in order without any formats or $ as they are not needed since you have already defined that variables and any informat that the INPUT statement would need in the previous steps.  You can even use positional variable list to make this easier.  So if the first variable in your CSV file is NAME and the last is AGE  you input statement is simply 'input name -- age;' .
  • Now add a DATA statement at the top.
  • Add an INFILE statement right after the DATA statement and before the LENGTH statement. Make sure to include the DSD and TRUNCOVER options. You will also need a FIRSTOBS=2 option to skip the header row. If you are reading a Windows/DOS file on a Unix machine you might want to add TERMSTR=CRLF option. If you are reading a CSV file made by Excel on a Mac then you might need to add the TERMSTR=CR option as Excel is still living in the past.  If you have really long lines in your CSV file you might need to add a LRECL= option.  The default is now 32767 but depending on your system you can set that to something as high as 2 or 4 million if you need to.
  • Add a RUN statement to the bottom.

You should now have a full data step for reading your file structure. For example here is a data step to read a csv file made from the SASHELP.CLASS dataset.

data want ;
  infile 'class.csv' dsd truncover firstobs=2;
  length Name $20 Sex $1 Age Height Weight 8;
  input name -- weight;
run;

To re-use the program to read multiple files with the same structure just change the output dataset name and the input text file name.

Contributor
Posts: 22

Re: Character Variables

I tried this, but when I use getnames=yes in the proc import statement, I still had the same issues.
Ask a Question
Discussion stats
  • 9 replies
  • 338 views
  • 4 likes
  • 4 in conversation