BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EMC9
Obsidian | Level 7

Good day

 

i have a data set that looks like below :

data have;
input UNIQUE $ DATE $ NEW_VAR ;
NEW_VAR = SUBSTR(DATE,3,7);

1 12JAN13    JAN
2 01MAY15    MAY
3 17APR14    APR

I want to group & order by unique & new_var  but I get an error that the new_var does not exist , I want this to be done in  PROC SQL and not a data step as it has over 24million records and space is an issue.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Not sure if this is what you are after

 

data have;
input UNIQUE DATE$;
datalines;
1 12JAN13
1 15JAN13
1 14JAN13
1 19JAN13
1 30JAN13
2 01MAY15
3 17APR14
;

proc sql;
	create table want as
	select *
	      ,input(date, date9.) as NewVar format=monname3.
	from have;
quit;

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

Please elaborate. How do you want your grouping done? The data is unique in both variables, so there is not much grouping to be done.

 

Post a clear description of how your data looks and what your desired outcome looks like.

EMC9
Obsidian | Level 7

UNIQUE DATE NEW_VAR

 

1 12JAN13    JAN

1 15JAN13    JAN

1 14JAN13    JAN

1 19JAN13    JAN

1 30JAN13    JAN
2 01MAY15    MAY
3 17APR14    APR

 

so one Unique  number could be in multiple dates so I want to group that unique number by unique month , if that makes sense.

PeterClemmensen
Tourmaline | Level 20

Not sure if this is what you are after

 

data have;
input UNIQUE DATE$;
datalines;
1 12JAN13
1 15JAN13
1 14JAN13
1 19JAN13
1 30JAN13
2 01MAY15
3 17APR14
;

proc sql;
	create table want as
	select *
	      ,input(date, date9.) as NewVar format=monname3.
	from have;
quit;
EMC9
Obsidian | Level 7

hi 

 

yes, this is what I am after with grouping - when I try group by unique , newvar it does not initialize ? 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Right, a few issues here.

First off, DONT WRITE OR CODE ALL IN UPPERCASE!

Next off your "test" datastep does not work.  You input new_var from what I assume to be your datalines (although datalines is missing), as a numeric, which JAN is not for instance.  So the variable gets created as numeric with no data.

Next up is the substr, which is being put into a numeric variable - that will also fail.

So to correct your test data step to start with looks like this:

data have;
  input unique $ date $;
  new_var=substr(date,3,7);
datalines;
1 12JAN13 JAN
2 01MAY15 MAY
3 17APR14 APR
;
run;

Now onto your question.  You say you want this to be done in SQL, there is only one technical reason why you would have to use proc sql, and that is if you are passing the SQL code through to a database.  Otherwise there is no requirement to use SQL, and in some cases the resource usage will be higher than if you use datastep.  So your statement that space is an issue actually works against the use SQL.

 

Now you say you get an error in the code you are running, please present the code you are running and sufficient test data to be able to check, as running the one I present above will have a new_var variable.

 

Finally, if your working with 24 million records, then its likely you need to approach the problem differently, use some big data techniques, filter the data down as much as you can, add flags and groups rather than creating multiple datasets, sort only when absolutely necessary.  Just switching to using proc sql will not solve these things for you.

ballardw
Super User

If your data was actually read with the code you provide you should have a bunch of lines in the log that look something like this:

NOTE: Invalid data for NEW_VAR in line 5 14-16.
NOTE: Invalid third argument to function SUBSTR at line 3 column 12.
NOTE: Invalid numeric data, 'JAN13' , at line 3 column 12.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
5          1 12JAN13    JAN
UNIQUE=1 DATE=12JAN13 NEW_VAR=. _ERROR_=1 _N_=1

Since you read new_var on the input statement without a $ it is assumed to be numeric. Then values such as Jan May Apr are invalid numeric values. Also the assignment

NEW_VAR = SUBSTR(DATE,3,7)

subsequently fails because New_var is numeric.

So new_var is always missing.

 

Show the actual Proc SQL that your are attempting to see if you are doing something similar in your SQL.

 

You may need to use

group by calculated new_var

to tell SAS that the variable used for grouping is calculated on the Select clause.

 

However if you use actual SAS Date values instead of character many things can be done with date functions or formats that may not require adding an additional variable at all, depending on why you need the month abbreviation.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 589 views
  • 1 like
  • 4 in conversation