Hello,
I am relatively new to SAS and am wondering whether values are being truncated when I import. I have read quite a bit on importing, length statement, informat, and format statement, as well as importing an Excel workbook using proc import, infile statement, etc. My ultimate goal is to merge paid search, organic search, Google Analytics, and Facebook analytics data sets (4) which I downloaded through Supermetrics for Google sheets for funnel conversion insight. The data set I am stuck on (#1 unfortunately) is Google paid search data, with tabs as months. All my data sets are Excel with tabs as months, so if I figure this out, it should hopefully work for the rest!
My goal was to import the Excel workbook and merge the months. The best solution I found for this was:
*Impporting Paidsearch data & creating SAS dataset;
%macro pim(sheet); /*Creating macro variable for proc import step*/
proc import out=paid_temp /*Creating work.paid_temp data set*/
datafile='E:\DI_Project\Data\RawData\Paidsearch2019.xlsx' /*Importing Excel Workbook*/
dbms=xlsx /*Defining DBMS*/
replace; /*Replace = will allow you to replace the exported data set if you re-run the code*/
sheet="&sheet"; /*referring to macro variable for sheet names*/
getnames=yes;
run;
%mend piim; /*Defining macro variables for each individual sheet in workbook*/
%pim(Jan19);
%pim(Feb19);
%pim(Mar19);
%pim(Apr19);
%pim(May19);
%pim(Jun19);
%pim(Jul19);
%pim(Aug19);
%pim(Sep19);
%pim(Oct19);
%pim(Nov19);
%pim(Dec19);
The only problem is that mainly the variable Campaign_Name is a text string with varying lengths (36-88) depending on the month, so I am wondering:
a) if there is some way to standardize (or change) the variable length (and type) while importing into SAS to avoid truncation and additional missing values due to values that are shorter or not the same length?
b) how to treat the many missing values for the numeric variables in the data set (ex: not sure if I should leave them as missing or replace them with zeros)?
The majority of the other variables are numeric, 8 bytes, BEST12. The variables are marketing metrics (ex: CPC, CTR, Conversion value, etc.).
I have to do a report for Friday so any help is VERY MUCH appreciated! 😃
i FINALLY FOUND THE SOLUTION!!!
There are multiple aspects involved in this process so hopefully this will help whoever wants to use Excel files to import into SAS!
/*Assigning librefs*/
libname SASData "E:\DI_Project\Data\SAS";
libname WorkData "E:\DI_Project\Data\WorkData";
/*Excel LIBNAME REF FOR DATA IMPORT*/
libname Paid "E:\DI_Project\Data\RawData\Paidsearch2019.xlsx"
header=yes /*Tells SAS variable names are in 1st row*/
dbsaslabel=none /*Removes labels*/
scantime=yes /*Tells SAS to read date & time variables as datetime*/
mixed=yes /*Tells SAS not to put missing values if a variable has both character & numeric data*/
stringDates=no; /*Tells SAS NOT to treat dates as text strings or character variables*/
/*VERIFY VARIABLE LENGTHS & TYPES*/
proc contents data=paid."jan19$"n; run;
proc contents data=paid."feb19$"n; run;
proc contents data=paid."mar19$"n; run;
proc contents data=paid."apr19$"n; run;
proc contents data=paid."may19$"n; run;
proc contents data=paid."jun19$"n; run;
proc contents data=paid."jul19$"n; run;
proc contents data=paid."aug19$"n; run;
proc contents data=paid."sep19$"n; run;
proc contents data=paid."oct19$"n; run;
proc contents data=paid."nov19$"n; run;
proc contents data=paid."dec19$"n; run;
/*JANUARY IMPORT*/
/*Formatting January Date Variable*/
data jandate; /*Output file or File being written to*/
retain Date; /*Tells SAS to retain the date formatting from Excel (treat as a date)*/
format Date ddmmyy10.; /*Tells SAS to format date from date9. to ddmmyy10., but you can choose whatever format desired*/
set paid."Jan19$A1:U470"n; /*Input file or File being read from*/
run;
/*VERIFY DATE FORMAT CHANGE*/
proc contents data=jandate;
run;
/*METHOD 1: RE-ASSIGNING VARIABLE LENGTHS & TYPES*/
/*#1 - Create macrovariable varlist containing a list of variable names from Jan*/
proc contents data=jandate noprint out=metaclass;
run;
options nocenter nodate;
title 'Meta Data for jandate';
proc print data=metaclass;
var name type length varnum;
run;
data _null_;
length allvars $1000;
retain allvars '';
set metaclass end=eof;
allvars = trim(left(allvars))||''||left(name);
if eof then call symput('varlist',allvars);
run;
%put &varlist; /*Macrovariable is created!!!*/
/*#2 - MODIFY VARIABLE TYPE & LENGTH IN JANDATE*/
/*Essentially this is renaming the variables, creating new variables with the old names & inputting the data, so it's like you modified the types & lengths, when in reality you created new variables altogether.*/
data sasdata.jan19 (drop=v1-v21); /*Creating new permanent SAS dataset for merging later & renaming variables*/
retain &varlist; *<-- preserve variable order ;
length *<-- define new types/lengths;
Campaign_name $100 Campaign_ID $15 Network $15
Impressions Ave_pageviews Ave_timeonsite_sec Ave_frequency Clicks Conversions Channel
Bounce_rate Interaction_rate CTR Conversion_rate Daily_budget Cost CPC CPA Conversion_Value ROAS 8;
format *<-- recreate formats;
Campaign_name $CHAR100. Campaign_ID $CHAR15. Network $CHAR15.
Date ddmmyy10.
Impressions Ave_pageviews Ave_timeonsite_sec Ave_frequency Clicks Conversions 10.
Channel 1.
Bounce_rate Interaction_rate CTR Conversion_rate 4.2
Daily_budget Cost CPC CPA Conversion_Value ROAS DOLLAR10.2;
set jandate (rename=( Campaign_name=v1 Campaign_ID=v2 Date=v3 Network=v4 Channel=v5 Impressions=v6
Bounce_rate=v7 Ave_pageviews=v8 Ave_timeonsite_sec=v9 Ave_frequency=v10
Interaction_rate=v11 Daily_budget=v12 Clicks=v13 CTR=v14 Cost=v15 CPC=v16
Conversions=v17 Conversion_rate=v18 CPA=v19 Conversion_Value=v20 ROAS=v21));
Campaign_name = put(v1,char100.); *<-- redefine variable;
Campaign_ID = put(v2,char15.); *<-- redefine variable;
Date = v3;
Network = put(v4,char15.); *<-- redefine variable;
Channel = v5;
Impressions = v6;
Bounce_rate = v7;
Ave_pageviews = v8;
Ave_timeonsite_sec = v9;
Ave_frequency = v10;
Interaction_rate = v11;
Daily_budget = v12;
Clicks = v13;
CTR = v14;
Cost = v15;
CPC = v16;
Conversions = v17;
Conversion_rate = v18;
CPA = put(v19,8.); *<-- redefine variable;
Conversion_Value = v20;
ROAS = v21;
run;
/*FEBRUARY IMPORT: REPEAT EVERYTHING EXCEPT THE CREATION OF THE MACRO VARIABLE*/
/*Formatting January Date Variable*/
data febdate;
retain Date;
format Date ddmmyy10.;
set paid."feb19$A1:U374"n;
run;
/*Verifying Date Format Change*/
proc contents data=febdate;
run;
/*RE-ASSIGNING VARIABLE LENGTHS & TYPES*/
/* Modifying variable type and length of febdate */
data sasdata.feb19 (drop=v1-v21); /*Creating new permanent SAS dataset for merging later & renaming variables*/
retain &varlist; *<-- preserve variable order ;
length *<-- define new types/lengths;
Campaign_name $100 Campaign_ID $15 Network $15
Impressions Ave_pageviews Ave_timeonsite_sec Ave_frequency Clicks Conversions Channel
Bounce_rate Interaction_rate CTR Conversion_rate Daily_budget Cost CPC CPA Conversion_Value ROAS 8;
format *<-- recreate formats;
Campaign_name $CHAR100. Campaign_ID $CHAR15. Network $CHAR15.
Date ddmmyy10.
Impressions Ave_pageviews Ave_timeonsite_sec Ave_frequency Clicks Conversions 10.
Channel 1.
Bounce_rate Interaction_rate CTR Conversion_rate 4.2
Daily_budget Cost CPC CPA Conversion_Value ROAS DOLLAR10.2;
set febdate (rename=( Campaign_name=v1 Campaign_ID=v2 Date=v3 Network=v4 Channel=v5 Impressions=v6
Bounce_rate=v7 Ave_pageviews=v8 Ave_timeonsite_sec=v9 Ave_frequency=v10
Interaction_rate=v11 Daily_budget=v12 Clicks=v13 CTR=v14 Cost=v15 CPC=v16
Conversions=v17 Conversion_rate=v18 CPA=v19 Conversion_Value=v20 ROAS=v21));
Campaign_name = put(v1,char100.); *<-- redefine variable;
Campaign_ID = put(v2,char15.); *<-- redefine variable;
Date = v3;
Network = put(v4,char15.); *<-- redefine variable;
Channel = v5;
Impressions = v6;
Bounce_rate = v7;
Ave_pageviews = v8;
Ave_timeonsite_sec = v9;
Ave_frequency = v10;
Interaction_rate = v11;
Daily_budget = v12;
Clicks = v13;
CTR = v14;
Cost = v15;
CPC = v16;
Conversions = v17;
Conversion_rate = v18;
CPA = put(v19,8.); *<-- redefine variable;
Conversion_Value = v20;
ROAS = v21;
run;
/*Verifying sasdata.feb19*/
proc contents data=sasdata.feb19;
run;
/*MERGE DATASETS*/
data sasdata.all_paid;
set sasdata.jan19 sasdata.feb19;
run;
/*DATA NOT TRUNCATED!!!! YAAAASSSSSSS :)*/
*****NOTE: Obviously, the reassignment of the variables types & lengths part could be turned into a macro ultimately, or do-loop, but since I do not have time for this project, I will not cover this in this post. Enjoy!*****
One of the problems with using a spreadsheet for data entry. There isn't an easy way to tell SAS to ignore what your Excel sheet has. But you should be able to adjust to it pretty easily if the variable names (or as Excel sees them column headers) are the same in each sheet.
In fact you should be able to use the XLSX libname and so that you don't need the macro.
When combining the datasets just define the length of the variables BEFORE the SET statement to avoid truncation. Make sure to remove the formats that SAS will mistakenly attach to the character variables. If you have variable that is 40 characters long but use the $30. format with it then you will only see the first 30 characters of the values.
libname paid xlsx 'E:\DI_Project\Data\RawData\Paidsearch2019.xlsx' access=readonly;
data all ;
length Campaign_Name $40 ;
set paid.Jan19 paid.Feb19 paid.Mar19 paid.Apr19 paid.May19 paid.Jun19
paid.Jul19 paid.Aug19 paid.Sep19 paid.Oct19 paid.Nov19 paid.dec19
;
format _character_ ;
run;
That is a surprising result.
And if you read them separately and then combine them does the same thing happen?
data jan19; set paid.jan19; run;
data feb19; set paid.feb19; run;
data both; set jan19 feb19; run;
Yes, initially when trying your code, I received the following log Note:
WARNING: Multiple lengths were specified for the variable Campaign_name by input data set(s). This can cause truncation of data.
NOTE: There were 1981 observations read from the data set WORK.JAN19.
NOTE: There were 3859 observations read from the data set WORK.FEB19.
NOTE: The data set WORK.BOTH has 5840 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
when I placed a length statement after each data statement, that was not the case.
NOTE: There were 1981 observations read from the data set WORK.JAN19.
NOTE: There were 3859 observations read from the data set WORK.FEB19.
NOTE: The data set WORK.BOTH has 5840 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Perhaps I should be using a macro like the one I had in my original post or at the least a do loop with a macro? That would technically reiterate the process, initializing the length statement each time, no?
If you know the maximum length for each character variable then defining it before the SET statement will make sure no values are truncated. Just because you get the warning does not mean anything is actually truncated. But it might be that other variables are being truncated.
Run PROC CONTENTS and check.
This SQL code will print any variable that has more than one TYPE or more than one LENGTH.
proc contents data=paid._all_ noprint out=contents; run;
proc sql ;
select upcase(name) as uname,name,memname,type,length
from contents
group by calculated uname
having count(distinct type)>1
or count(distinct length)>1
order by 1,2,3
;
quit;
This is the result:
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.