Re: looking to create a new var based on the values of a second datase...

veblen · Posted 05-16-2011 06:22 PM

I appear to having a mental block this afternoon...

What's the simplest way to create a var based on:

where dataset1 varA = dataset2 varA

then dataset1A newvar = dataset2 varB

Basically, I'm looking to add a new var in dataset1 that contains the value from dataset2 varB where dataset1 varA = dataset2 varA.

Thanks!!!

Reeza · Posted 05-16-2011 07:12 PM

Probably using Proc SQL

proc sql;
create table a as
select a.*, b.varb
from a
left join b
on a.vara=b.vara;
quit;

Problem is having multiples, then you need to figure out how to deal with those, depending on your data logic.

If you create an empty var in an earlier step you can update instead of creating a new table.

veblen · Posted 05-16-2011 07:25 PM

Thanks Reeza!!!

Is it better to add an empty var to dataset A and just update the data as you mention?

And if so, how would the code look?

Reeza · Posted 05-16-2011 07:33 PM

If the dataset is large and creating a second copy of the dataset would be cumbersome then this is a more efficient way. However if you have duplicate results in your join you can run into issues.
[pre]
data have1;
set sashelp.class;
*need to set space for it;
format empty_var $25.;
empty_var='';
run;

proc sql;
update have1 as a
set empty_var=(select b.name from sashelp.class as b where a.name=b.name);
quit;

[/pre]

veblen · Posted 05-17-2011 12:33 PM

Thanks Reeza!!!

I will let you know if I have any questions.

veblen · Posted 05-17-2011 12:51 PM

In my world I currently have two datasets. Can you please match these datasets and vars to your code. I am missing something when I get to the select line.

dataset1: contains vara (the dataset to add empty_var?)
dataset2: contains varb and the value to set empty_var to)

I see that have1 is the new dataset with the added var (empty_var). but whicj dataset is sashelp.class?

Thanks!!!

stateworker · Posted 05-17-2011 03:31 PM

data set3;
merge dataset1 (in=a) dataset2 (in=b keep=var1 var2);
by var1;
if a;
run;

This will make a new data set (never overwrite your original data set when testing out code!) by merging your first and second data sets by your common variable. If dataset1 has var1's that aren't in dataset2, then set3.var2 will be empty. But where dataset1's var1 equals dataset2's var1, then set3.var2 will be the same as dataset2.var2.

However, if you need dataset2's var1's that aren't also in dataset1, then this won't work because it will only bring in those where the var1 is in common.

veblen · Posted 05-17-2011 05:41 PM

Thanks Stateworker!!!

but in your example, do the two datasets have to have an equal number of observations?

If so, it's not going to work since the two datasets are of different size.

Basically, I'm looking to add a new var in dataset1 that contains the value from dataset2 varB where dataset1 varA = dataset2 varA.

note: dataset1 and dataset2 do not contain an equal number of observations.

Ksharp · Posted 05-17-2011 09:45 PM

That is not issue.SAS will process two dataset with not matched observations.
But If your dataset is large ,I recommend you to use HashTable.

Ksharp

veblen · Posted 05-17-2011 10:10 PM

Thanks Ksharp!!!

What size of dataset do you consider large?

(currently, I am not sure of what my maximum number of observations will be, but will do some analysis.)

stateworker: Based on Ksharp's input, I will give your code a try. Thanks again!!! Message was edited by: veblen

veblen · Posted 05-17-2011 10:27 PM

the answer to this question should resolve the issue for me:

what is the proper syntax for defining the two datasets within the proc sql and data steps posted?

is it library.name?

for example: mydata.datasetA or sasdata.datasetB

or is it something else?

Once I understand the calling and indexing into datasets, I will be able to do what I need to do with the language.

Ksharp · Posted 05-18-2011 03:20 AM

Yes.You are right.
[pre]
Data data1;
input Date : yymmdd10. Exdate : yymmdd10. CP_FLAG $ Volum Strike_Price Settle_Price ;
format date yymmdd10. exdate :yymmdd10. ;
datalines;
2008/1/2 2008/1/16 C 200 27.0 23.15
2008/1/2 2008/1/16 P 300 27.0 23.15
2008/1/2 2008/1/16 C 552 22.5 23.15
2008/1/2 2008/1/16 P 561 22.5 23.15
2008/1/3 2008/1/16 C 756 25.0 23.42
2008/1/3 2008/1/16 P 654 25.0 23.42
2008/1/3 2008/1/16 C 783 22.5 23.42
2008/1/3 2008/1/16 P 150 22.5 23.42
2008/1/4 2008/1/16 C 276 22.5 27.00
2008/1/4 2008/1/16 P 246 22.5 27.00
2008/1/4 2008/1/16 C 276 25.0 27.00
2008/1/4 2008/1/16 P 221 25.0 27.00
2008/1/4 2008/1/16 C 344 27.5 27.00
2008/1/4 2008/1/16 P 476 27.5 27.00
2008/1/5 2008/1/16 C 222 22.5 24.00
2008/1/5 2008/1/16 P 343 22.5 24.00
2008/1/5 2008/1/16 C 768 25.0 24.00
2008/1/5 2008/1/16 P 343 25.0 24.00
2008/1/5 2008/1/16 C 213 27.5 24.00
2008/1/5 2008/1/16 P 343 27.5 24.00
;
run;
proc sql;
create table want as
select *
from data1
group by date
having abs(Strike_Price-Settle_Price) =
min(abs(Strike_Price-Settle_Price))
;
quit;
proc sort data=want nodupkey out=op1(keep=date strike_price);
by date strike_price;
run;
data op1;
set op1;
_strike_price=lag(strike_price);
drop strike_price;
run;
data temp(where=(strike_price=_strike_price));
merge work.data1 work.op1;
by date;
;[/pre]

the bold part is to stand for library name of table data1 and op1.
Namely,I merge the tables data1 and op1 which are all in WORK library.

Ksharp

looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Re: looking to create a new var based on the values of a second dataset

Registration is open

SAS Training: Just a Click Away