Hello,
I am trying to cut down the length of my variables to save disc space and the usual length statement I use does not seem to be working.
Here is my code:
data new_union; length ss_other_specify $255. Manufacturer1 $50. Manufacturer2 $50. Vaccination_Code_Id1 $50. Vaccination_Code_Id2 $50.; set union_combine; /*Vax code format*/ if Vaccination_Code_Id1 in ("No vaccine administered", "Other", "Unknown vaccine or immune globulin") then Vaccination_Code_Id1 = ""; if Vaccination_Code_Id2 in ("No vaccine administered", "Other", "Unknown vaccine or immune globulin") then Vaccination_Code_Id2 = ""; /*Vax date format*/ Vaccination_Date1_d= mdy(substr(Vaccination_Date1,6,2), substr(Vaccination_Date1,9,2), substr(Vaccination_Date1,1,4)); Vaccination_Date2_d= mdy(substr(Vaccination_Date2,6,2), substr(Vaccination_Date2,9,2), substr(Vaccination_Date2,1,4)); drop Vaccination_Date1; drop Vaccination_Date2; format Vaccination_Date1_d mmddyy10. Vaccination_Date2_d mmddyy10.; run; data new_union; set new_union; rename Vaccination_Date1_d = Vaccination_Date1; rename Vaccination_Date2_d = Vaccination_Date2; format Vaccination_Date1 mmddyy10. Vaccination_Date2 mmddyy10.; /*Manufacturer reformat*/ Manufacturer1 = upcase(Manufacturer1); Manufacturer2 = upcase(Manufacturer2); if Manufacturer1 = "JOHNSON AND JOHNSON" then Manufacturer1 = "JANSSEN (JOHNSON AND JOHNSON)"; if Manufacturer1 = "PFIZER" then Manufacturer1 = "PFIZER/BIONTECH"; if Manufacturer2 = "JOHNSON AND JOHNSON" then Manufacturer2 = "JANSSEN (JOHNSON AND JOHNSON)"; if Manufacturer2 = "PFIZER" then Manufacturer2 = "PFIZER/BIONTECH"; /* if case_id in ("100734763","100748803") then Manufacturer2 = "" and Vaccination_Date2 = . and Vaccination_Code_Id2 = ""; */ /* if Manufacturer2 = "0" then Manufacturer2 = ""; */ run;
After I run proc contents on the "new_union" dataset, the variable length is still $350, not $50 like I would like. I am running 2 separate data steps because I've had trouble with keeping them all in one when I use drop and rename.
Thanks for any help!
When you create a new dataset from existing dataset, then you should create new variable names. Length doesn't change on existing variables.
In your case, Vaccination_Code_Id1 or 2 both are in union_combine dataset. So try creating somewhat similar but different names in length statement as well as using them in code.
When you create a new dataset from existing dataset, then you should create new variable names. Length doesn't change on existing variables.
In your case, Vaccination_Code_Id1 or 2 both are in union_combine dataset. So try creating somewhat similar but different names in length statement as well as using them in code.
Your program should have been effective at cutting down the length. Are you sure you ran the PROC CONTENTS on the new data set, not the old?
I made some fake data like this:
data union_combine;
length id 8 ss_other_specify $255. Vaccination_Date1 $10
Manufacturer1 $350. Manufacturer2 $350. Vaccination_Code_Id1 $350. Vaccination_Code_Id2 $350.;
retain Manufacturer1 "Big Pharma 1" Manufacturer2 "Big Pharma 2";
do id=1 to 4;
Vaccination_Code_Id1 =choosec(ID,"No vaccine administered", "Other", "Unknown vaccine or immune globulin"," ");
Vaccination_Code_Id2 =choosec(abs(5-ID),"No vaccine administered", "Other", "Unknown vaccine or immune globulin"," ");
Vaccination_Date1='----------';
Vaccination_Date2='----------';
substr(Vaccination_Date1,6,2)='07';
substr(Vaccination_Date1,9,2)=put(ID,z2.);
substr(Vaccination_Date1,1,4)='2021';
substr(Vaccination_Date2,6,2)='07';
substr(Vaccination_Date2,9,2)=put(ID+14,z2.);
substr(Vaccination_Date2,1,4)='2021';
Manufacturer1 = choosec(ID,"JOHNSON AND JOHNSON","PFIZER","JOHNSON AND JOHNSON","PFIZER");
Manufacturer2 = choosec(ID,"PFIZER","JOHNSON AND JOHNSON","PFIZER","JOHNSON AND JOHNSON");
output;
end;
run;
Then I ran this single data step that did all the conversions you were doing:
data new_union;
length ss_other_specify $255. Manufacturer1 $50. Manufacturer2 $50. Vaccination_Code_Id1 $50. Vaccination_Code_Id2 $50.;
set union_combine (rename=(Vaccination_Date1 = _dt1 Vaccination_Date2 = _dt2));
/*Vax code format*/
if Vaccination_Code_Id1 in ("No vaccine administered", "Other", "Unknown vaccine or immune globulin")
then Vaccination_Code_Id1 = "";
if Vaccination_Code_Id2 in ("No vaccine administered", "Other", "Unknown vaccine or immune globulin")
then Vaccination_Code_Id2 = "";
/*Vax date format*/
Vaccination_Date1= mdy(substr(_dt1,6,2), substr(_dt1,9,2), substr(_dt1,1,4));
Vaccination_Date2= mdy(substr(_dt2,6,2), substr(_dt2,9,2), substr(_dt2,1,4));
Manufacturer1 = upcase(Manufacturer1);
Manufacturer2 = upcase(Manufacturer2);
if Manufacturer1 = "JOHNSON AND JOHNSON" then Manufacturer1 = "JANSSEN (JOHNSON AND JOHNSON)";
if Manufacturer1 = "PFIZER" then Manufacturer1 = "PFIZER/BIONTECH";
if Manufacturer2 = "JOHNSON AND JOHNSON" then Manufacturer2 = "JANSSEN (JOHNSON AND JOHNSON)";
if Manufacturer2 = "PFIZER" then Manufacturer2 = "PFIZER/BIONTECH";
drop _:;
format Vaccination_Date1 mmddyy10. Vaccination_Date2 mmddyy10.;
run;
All the variable sizes I changed to $50 worked just fine:
proc contents data=new_union;
run;
Alphabetic List of Variables and Attributes | ||||
---|---|---|---|---|
# | Variable | Type | Len | Format |
2 | Manufacturer1 | Char | 50 | |
3 | Manufacturer2 | Char | 50 | |
4 | Vaccination_Code_Id1 | Char | 50 | |
5 | Vaccination_Code_Id2 | Char | 50 | |
7 | Vaccination_Date1 | Num | 8 | MMDDYY10. |
8 | Vaccination_Date2 | Num | 8 | MMDDYY10. |
6 | id | Num | 8 | |
1 | ss_other_specify | Char | 255 |
Can you share your results?
For which specific variable? You have 4 variables where you might be setting the length to 50.
You can avoid all of the character to numeric conversion messages you get from using this code:
/*Vax date format*/ Vaccination_Date1_d= mdy(substr(Vaccination_Date1,6,2), substr(Vaccination_Date1,9,2), substr(Vaccination_Date1,1,4)); Vaccination_Date2_d= mdy(substr(Vaccination_Date2,6,2), substr(Vaccination_Date2,9,2), substr(Vaccination_Date2,1,4));
by using the proper INFORMAT on the entire string.
Data example; x='21020715'; y=input(x,yymmdd8.); format y mmddyy10.; run;
Or if your Vaccination_date variables are simple numbers convert to string and use informat
Data example2; x=21020715; y=input(put(x,f8.),yymmdd8.); format y mmddyy10.; run;
data have; set sashelp.class; run; proc sql; alter table have modify sex char(20),name char(10); quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.