Good day
i have a data set that looks like below :
data have;
input UNIQUE $ DATE $ NEW_VAR ;
NEW_VAR = SUBSTR(DATE,3,7);
1 12JAN13 JAN
2 01MAY15 MAY
3 17APR14 APR
I want to group & order by unique & new_var but I get an error that the new_var does not exist , I want this to be done in PROC SQL and not a data step as it has over 24million records and space is an issue.
Not sure if this is what you are after
data have;
input UNIQUE DATE$;
datalines;
1 12JAN13
1 15JAN13
1 14JAN13
1 19JAN13
1 30JAN13
2 01MAY15
3 17APR14
;
proc sql;
create table want as
select *
,input(date, date9.) as NewVar format=monname3.
from have;
quit;
Please elaborate. How do you want your grouping done? The data is unique in both variables, so there is not much grouping to be done.
Post a clear description of how your data looks and what your desired outcome looks like.
UNIQUE DATE NEW_VAR
1 12JAN13 JAN
1 15JAN13 JAN
1 14JAN13 JAN
1 19JAN13 JAN
1 30JAN13 JAN
2 01MAY15 MAY
3 17APR14 APR
so one Unique number could be in multiple dates so I want to group that unique number by unique month , if that makes sense.
Not sure if this is what you are after
data have;
input UNIQUE DATE$;
datalines;
1 12JAN13
1 15JAN13
1 14JAN13
1 19JAN13
1 30JAN13
2 01MAY15
3 17APR14
;
proc sql;
create table want as
select *
,input(date, date9.) as NewVar format=monname3.
from have;
quit;
hi
yes, this is what I am after with grouping - when I try group by unique , newvar it does not initialize ?
Hi,
Right, a few issues here.
First off, DONT WRITE OR CODE ALL IN UPPERCASE!
Next off your "test" datastep does not work. You input new_var from what I assume to be your datalines (although datalines is missing), as a numeric, which JAN is not for instance. So the variable gets created as numeric with no data.
Next up is the substr, which is being put into a numeric variable - that will also fail.
So to correct your test data step to start with looks like this:
data have; input unique $ date $; new_var=substr(date,3,7); datalines; 1 12JAN13 JAN 2 01MAY15 MAY 3 17APR14 APR ; run;
Now onto your question. You say you want this to be done in SQL, there is only one technical reason why you would have to use proc sql, and that is if you are passing the SQL code through to a database. Otherwise there is no requirement to use SQL, and in some cases the resource usage will be higher than if you use datastep. So your statement that space is an issue actually works against the use SQL.
Now you say you get an error in the code you are running, please present the code you are running and sufficient test data to be able to check, as running the one I present above will have a new_var variable.
Finally, if your working with 24 million records, then its likely you need to approach the problem differently, use some big data techniques, filter the data down as much as you can, add flags and groups rather than creating multiple datasets, sort only when absolutely necessary. Just switching to using proc sql will not solve these things for you.
If your data was actually read with the code you provide you should have a bunch of lines in the log that look something like this:
NOTE: Invalid data for NEW_VAR in line 5 14-16. NOTE: Invalid third argument to function SUBSTR at line 3 column 12. NOTE: Invalid numeric data, 'JAN13' , at line 3 column 12. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-- 5 1 12JAN13 JAN UNIQUE=1 DATE=12JAN13 NEW_VAR=. _ERROR_=1 _N_=1
Since you read new_var on the input statement without a $ it is assumed to be numeric. Then values such as Jan May Apr are invalid numeric values. Also the assignment
NEW_VAR = SUBSTR(DATE,3,7)
subsequently fails because New_var is numeric.
So new_var is always missing.
Show the actual Proc SQL that your are attempting to see if you are doing something similar in your SQL.
You may need to use
group by calculated new_var
to tell SAS that the variable used for grouping is calculated on the Select clause.
However if you use actual SAS Date values instead of character many things can be done with date functions or formats that may not require adding an additional variable at all, depending on why you need the month abbreviation.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.