Hi all
I have multiple datasets named after stock tickers: AAPL AMZN MSFT etc. I have about 1000 and I want to merge them all into one data file. Each dataset is set up with the same columns: date&time date hour minute holding_"ticker" (e.g. holding_aapl). I want my new dataset to have the columns: date&time date hour minute holding_aapl holding_amzn holding_msft etc. This is the code I have so far:
proc sql ;
create table WORK.mytables as
select*
from dictionary.tables
where libname = 'WORK'
order by memname ;
quit ;
%macro mergedata;
proc sql ;
select count(memname) into: obs from work.mytables;
%LET OBS=&OBS.;
%let obs=&obs.;
SELECT CATX("test", memname)
INTO :SETNAME1-:SETNAME&OBS.
FROM work.MYTABLES;
QUIT;
%DO i=1 %to &obs.;
data merged;
merge work.aapl (in=ina)
work.&&setname&i (in=inb);
by date hour dummy;
if ina;
run;
proc append data=merged base=work.merged;
run;
%end;
%mend;
%mergedata;
But the new merge keeps replacing the old merge so I am left with a dataset that has date&time date hour minute holding_aapl holding_msft or whichever the last chronological stock is. I tried it without the proc append line and still overwriting it. Any help would be much appreiated!
I don't think you want a data step merge, that is side by side and replaces values of like named variables.
It sounds like you want an Append
Simplest sounds like:
data want; set aapl amzn msft <continue the list>; keep <the list of common variables goes here>; run;
If this doesn't work then provide some details why not.
I do want it to be side by side. I might be using the proc append function incorrectly. In the final dataset I want to have all the tickers as individual columns:
DATE&TIME DATE HOUR MINUTE HOLDING_AAPL HOLDING AMZN ------- HOLDING_LUV --------HOLDING_MSFT etc.
Why do you want it side by side? Is this because each has an identical sequences of data & time? Are you trying to determine some sort of portfolio value over the time series?
If it is really a matter of synchronized time series, you would almost certainly be better off interleaving the series, i.e. N(stocks) records for time point 1 followed by N(stocks) records for time point 2, etc.
More context please.
You originally said the files all had the same structure. How are you going to create NEW variables?
Your description of the variable names is also very confusing.
date&time date hour minute holding_"ticker"
Do you really have a variable name with an & in it?
Why do you have DATE twice? Why do you have TIME and HOUR and MINUTE? Is the first variable really the just the same information as the next three?
Does that last variable name really have quotes in it? Perhaps you mean to say that the last variable has a different name in each of the source datasets?
Are the original datasets sorted by the DATETIME?
Why is your attempt to merge only using DATE HOUR? Are you trying to collapse multiple minute level observations into a single observation for the hour?
Please post a few observations of at least two of the datasets and show what output dataset you want for the posted example input.
As discussed in your other thread, creating a long instead of wide data set is the way to go. Don't make the mistake of insisting on a wide data set, this makes your programming more difficult. A long data set, as shown in your other thread, is much easier to work with.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.