PROQ SQL NEW VAR & GROUPING

Accepted Solution Solved
Reply
Contributor
Posts: 42
Accepted Solution

PROQ SQL NEW VAR & GROUPING

Good day

 

i have a data set that looks like below :

data have;
input UNIQUE $ DATE $ NEW_VAR ;
NEW_VAR = SUBSTR(DATE,3,7);

1 12JAN13    JAN
2 01MAY15    MAY
3 17APR14    APR

I want to group & order by unique & new_var  but I get an error that the new_var does not exist , I want this to be done in  PROC SQL and not a data step as it has over 24million records and space is an issue.


Accepted Solutions
Solution
‎11-21-2017 01:47 AM
PROC Star
Posts: 1,209

Re: PROQ SQL NEW VAR & GROUPING

Posted in reply to Allistair

Not sure if this is what you are after

 

data have;
input UNIQUE DATE$;
datalines;
1 12JAN13
1 15JAN13
1 14JAN13
1 19JAN13
1 30JAN13
2 01MAY15
3 17APR14
;

proc sql;
	create table want as
	select *
	      ,input(date, date9.) as NewVar format=monname3.
	from have;
quit;

View solution in original post


All Replies
PROC Star
Posts: 1,209

Re: PROQ SQL NEW VAR & GROUPING

Posted in reply to Allistair

Please elaborate. How do you want your grouping done? The data is unique in both variables, so there is not much grouping to be done.

 

Post a clear description of how your data looks and what your desired outcome looks like.

Contributor
Posts: 42

Re: PROQ SQL NEW VAR & GROUPING

UNIQUE DATE NEW_VAR

 

1 12JAN13    JAN

1 15JAN13    JAN

1 14JAN13    JAN

1 19JAN13    JAN

1 30JAN13    JAN
2 01MAY15    MAY
3 17APR14    APR

 

so one Unique  number could be in multiple dates so I want to group that unique number by unique month , if that makes sense.

Solution
‎11-21-2017 01:47 AM
PROC Star
Posts: 1,209

Re: PROQ SQL NEW VAR & GROUPING

Posted in reply to Allistair

Not sure if this is what you are after

 

data have;
input UNIQUE DATE$;
datalines;
1 12JAN13
1 15JAN13
1 14JAN13
1 19JAN13
1 30JAN13
2 01MAY15
3 17APR14
;

proc sql;
	create table want as
	select *
	      ,input(date, date9.) as NewVar format=monname3.
	from have;
quit;
Contributor
Posts: 42

Re: PROQ SQL NEW VAR & GROUPING

hi 

 

yes, this is what I am after with grouping - when I try group by unique , newvar it does not initialize ? 

Super User
Super User
Posts: 9,397

Re: PROQ SQL NEW VAR & GROUPING

Posted in reply to Allistair

Hi,

 

Right, a few issues here.

First off, DONT WRITE OR CODE ALL IN UPPERCASE!

Next off your "test" datastep does not work.  You input new_var from what I assume to be your datalines (although datalines is missing), as a numeric, which JAN is not for instance.  So the variable gets created as numeric with no data.

Next up is the substr, which is being put into a numeric variable - that will also fail.

So to correct your test data step to start with looks like this:

data have;
  input unique $ date $;
  new_var=substr(date,3,7);
datalines;
1 12JAN13 JAN
2 01MAY15 MAY
3 17APR14 APR
;
run;

Now onto your question.  You say you want this to be done in SQL, there is only one technical reason why you would have to use proc sql, and that is if you are passing the SQL code through to a database.  Otherwise there is no requirement to use SQL, and in some cases the resource usage will be higher than if you use datastep.  So your statement that space is an issue actually works against the use SQL.

 

Now you say you get an error in the code you are running, please present the code you are running and sufficient test data to be able to check, as running the one I present above will have a new_var variable.

 

Finally, if your working with 24 million records, then its likely you need to approach the problem differently, use some big data techniques, filter the data down as much as you can, add flags and groups rather than creating multiple datasets, sort only when absolutely necessary.  Just switching to using proc sql will not solve these things for you.

Super User
Posts: 13,293

Re: PROQ SQL NEW VAR & GROUPING

Posted in reply to Allistair

If your data was actually read with the code you provide you should have a bunch of lines in the log that look something like this:

NOTE: Invalid data for NEW_VAR in line 5 14-16.
NOTE: Invalid third argument to function SUBSTR at line 3 column 12.
NOTE: Invalid numeric data, 'JAN13' , at line 3 column 12.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
5          1 12JAN13    JAN
UNIQUE=1 DATE=12JAN13 NEW_VAR=. _ERROR_=1 _N_=1

Since you read new_var on the input statement without a $ it is assumed to be numeric. Then values such as Jan May Apr are invalid numeric values. Also the assignment

NEW_VAR = SUBSTR(DATE,3,7)

subsequently fails because New_var is numeric.

So new_var is always missing.

 

Show the actual Proc SQL that your are attempting to see if you are doing something similar in your SQL.

 

You may need to use

group by calculated new_var

to tell SAS that the variable used for grouping is calculated on the Select clause.

 

However if you use actual SAS Date values instead of character many things can be done with date functions or formats that may not require adding an additional variable at all, depending on why you need the month abbreviation.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 109 views
  • 1 like
  • 4 in conversation