- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to cut down the length of my variables to save disc space and the usual length statement I use does not seem to be working.
Here is my code:
data new_union; length ss_other_specify $255. Manufacturer1 $50. Manufacturer2 $50. Vaccination_Code_Id1 $50. Vaccination_Code_Id2 $50.; set union_combine; /*Vax code format*/ if Vaccination_Code_Id1 in ("No vaccine administered", "Other", "Unknown vaccine or immune globulin") then Vaccination_Code_Id1 = ""; if Vaccination_Code_Id2 in ("No vaccine administered", "Other", "Unknown vaccine or immune globulin") then Vaccination_Code_Id2 = ""; /*Vax date format*/ Vaccination_Date1_d= mdy(substr(Vaccination_Date1,6,2), substr(Vaccination_Date1,9,2), substr(Vaccination_Date1,1,4)); Vaccination_Date2_d= mdy(substr(Vaccination_Date2,6,2), substr(Vaccination_Date2,9,2), substr(Vaccination_Date2,1,4)); drop Vaccination_Date1; drop Vaccination_Date2; format Vaccination_Date1_d mmddyy10. Vaccination_Date2_d mmddyy10.; run; data new_union; set new_union; rename Vaccination_Date1_d = Vaccination_Date1; rename Vaccination_Date2_d = Vaccination_Date2; format Vaccination_Date1 mmddyy10. Vaccination_Date2 mmddyy10.; /*Manufacturer reformat*/ Manufacturer1 = upcase(Manufacturer1); Manufacturer2 = upcase(Manufacturer2); if Manufacturer1 = "JOHNSON AND JOHNSON" then Manufacturer1 = "JANSSEN (JOHNSON AND JOHNSON)"; if Manufacturer1 = "PFIZER" then Manufacturer1 = "PFIZER/BIONTECH"; if Manufacturer2 = "JOHNSON AND JOHNSON" then Manufacturer2 = "JANSSEN (JOHNSON AND JOHNSON)"; if Manufacturer2 = "PFIZER" then Manufacturer2 = "PFIZER/BIONTECH"; /* if case_id in ("100734763","100748803") then Manufacturer2 = "" and Vaccination_Date2 = . and Vaccination_Code_Id2 = ""; */ /* if Manufacturer2 = "0" then Manufacturer2 = ""; */ run;
After I run proc contents on the "new_union" dataset, the variable length is still $350, not $50 like I would like. I am running 2 separate data steps because I've had trouble with keeping them all in one when I use drop and rename.
Thanks for any help!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When you create a new dataset from existing dataset, then you should create new variable names. Length doesn't change on existing variables.
In your case, Vaccination_Code_Id1 or 2 both are in union_combine dataset. So try creating somewhat similar but different names in length statement as well as using them in code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When you create a new dataset from existing dataset, then you should create new variable names. Length doesn't change on existing variables.
In your case, Vaccination_Code_Id1 or 2 both are in union_combine dataset. So try creating somewhat similar but different names in length statement as well as using them in code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your program should have been effective at cutting down the length. Are you sure you ran the PROC CONTENTS on the new data set, not the old?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I made some fake data like this:
data union_combine;
length id 8 ss_other_specify $255. Vaccination_Date1 $10
Manufacturer1 $350. Manufacturer2 $350. Vaccination_Code_Id1 $350. Vaccination_Code_Id2 $350.;
retain Manufacturer1 "Big Pharma 1" Manufacturer2 "Big Pharma 2";
do id=1 to 4;
Vaccination_Code_Id1 =choosec(ID,"No vaccine administered", "Other", "Unknown vaccine or immune globulin"," ");
Vaccination_Code_Id2 =choosec(abs(5-ID),"No vaccine administered", "Other", "Unknown vaccine or immune globulin"," ");
Vaccination_Date1='----------';
Vaccination_Date2='----------';
substr(Vaccination_Date1,6,2)='07';
substr(Vaccination_Date1,9,2)=put(ID,z2.);
substr(Vaccination_Date1,1,4)='2021';
substr(Vaccination_Date2,6,2)='07';
substr(Vaccination_Date2,9,2)=put(ID+14,z2.);
substr(Vaccination_Date2,1,4)='2021';
Manufacturer1 = choosec(ID,"JOHNSON AND JOHNSON","PFIZER","JOHNSON AND JOHNSON","PFIZER");
Manufacturer2 = choosec(ID,"PFIZER","JOHNSON AND JOHNSON","PFIZER","JOHNSON AND JOHNSON");
output;
end;
run;
Then I ran this single data step that did all the conversions you were doing:
data new_union;
length ss_other_specify $255. Manufacturer1 $50. Manufacturer2 $50. Vaccination_Code_Id1 $50. Vaccination_Code_Id2 $50.;
set union_combine (rename=(Vaccination_Date1 = _dt1 Vaccination_Date2 = _dt2));
/*Vax code format*/
if Vaccination_Code_Id1 in ("No vaccine administered", "Other", "Unknown vaccine or immune globulin")
then Vaccination_Code_Id1 = "";
if Vaccination_Code_Id2 in ("No vaccine administered", "Other", "Unknown vaccine or immune globulin")
then Vaccination_Code_Id2 = "";
/*Vax date format*/
Vaccination_Date1= mdy(substr(_dt1,6,2), substr(_dt1,9,2), substr(_dt1,1,4));
Vaccination_Date2= mdy(substr(_dt2,6,2), substr(_dt2,9,2), substr(_dt2,1,4));
Manufacturer1 = upcase(Manufacturer1);
Manufacturer2 = upcase(Manufacturer2);
if Manufacturer1 = "JOHNSON AND JOHNSON" then Manufacturer1 = "JANSSEN (JOHNSON AND JOHNSON)";
if Manufacturer1 = "PFIZER" then Manufacturer1 = "PFIZER/BIONTECH";
if Manufacturer2 = "JOHNSON AND JOHNSON" then Manufacturer2 = "JANSSEN (JOHNSON AND JOHNSON)";
if Manufacturer2 = "PFIZER" then Manufacturer2 = "PFIZER/BIONTECH";
drop _:;
format Vaccination_Date1 mmddyy10. Vaccination_Date2 mmddyy10.;
run;
All the variable sizes I changed to $50 worked just fine:
proc contents data=new_union;
run;
Alphabetic List of Variables and Attributes | ||||
---|---|---|---|---|
# | Variable | Type | Len | Format |
2 | Manufacturer1 | Char | 50 | |
3 | Manufacturer2 | Char | 50 | |
4 | Vaccination_Code_Id1 | Char | 50 | |
5 | Vaccination_Code_Id2 | Char | 50 | |
7 | Vaccination_Date1 | Num | 8 | MMDDYY10. |
8 | Vaccination_Date2 | Num | 8 | MMDDYY10. |
6 | id | Num | 8 | |
1 | ss_other_specify | Char | 255 |
Can you share your results?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For which specific variable? You have 4 variables where you might be setting the length to 50.
You can avoid all of the character to numeric conversion messages you get from using this code:
/*Vax date format*/ Vaccination_Date1_d= mdy(substr(Vaccination_Date1,6,2), substr(Vaccination_Date1,9,2), substr(Vaccination_Date1,1,4)); Vaccination_Date2_d= mdy(substr(Vaccination_Date2,6,2), substr(Vaccination_Date2,9,2), substr(Vaccination_Date2,1,4));
by using the proper INFORMAT on the entire string.
Data example; x='21020715'; y=input(x,yymmdd8.); format y mmddyy10.; run;
Or if your Vaccination_date variables are simple numbers convert to string and use informat
Data example2; x=21020715; y=input(put(x,f8.),yymmdd8.); format y mmddyy10.; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have; set sashelp.class; run; proc sql; alter table have modify sex char(20),name char(10); quit;