Hi
I'm a SAS programming beginner!
I've a got a dataset with a list of ID numbers and dates. I want to
open a different dataset corresponding to the month and year of each ID
opening date + one year on and get some data for each ID.
For example (dataset1)
ID open_date
123 01may2004
124 16aug2010
244 20jan2009
In the end I I want to create a dataset (from a different data set with further additional data) for each of
the ID that has a corresponding open_date + 1 year.
ID open_date current_date additional_data (from dataset2)
123 01may2004 01may2005 1
124 16aug2010 16aug2011 0
244 20jan2009 20jan2010 0
I guess this is similar to a lookup table in excel, but a little stuck in SAS - Hope you can help.
Are they both already SAS datasets? Does the date have to be precisely 1 year apart? What happens if the initial date is February 29th of a given year?
If they are both SAS datasets, and dependent upon the answers for the above questions, then you can merge the two files in either a datastep or proc sql, and control the other factors.
Whether a datastep will be sufficient will be a function of what kind of match there is between the two files, i.e. one to one, one to many, or many to many.
Yes, they are SAS datasets.
I want to chose the second dataset from lots of possible datasets depending on the ID open date e.g. if opendate is 1aug2010 then open dataset aug2010. The datasets are monthly datasets that I need the data from so single day precision is not required, just month.
You didn't answer regarding the type of matching that would exist. If it is only one to one, and since you are new to SAS, depending upon the number of records involved (e.g., if it is less than 10 million or so) I, personally, would combine all of the monthly files into one dataset (which you can probably do in a datastep using the filename statement .. possibly with a wild card), sort the resulting and main datasets by ID and month, and then merge then together in a datastep.
If you do it with a datastep, you will have to first insure that the two sets of files (let's call them have and wanttomerge) only have one variable that has the same name, namely id. If they do share any other variable names, you can always use the datastep rename option to change the name(s) in the wanttomerge dataset.
Without knowing more about the files one can't write the code for you, but it would probably look something like:
data want;
merge have (in=ina) wanttomerge (in=inb);
by id;
if ina and month(date) eq month(datefromwanttomerge) and
year(date)+1 eq year(datefromwanttomerge);
run;
Code like that would select all of the records from have, and combine them with records from wanttomerge, where the records exist in wanttomerge that share the same month but are one year later than those in have.
HTH,
Art
You never responded to my last post, thus the following may be totally irrelevant. However, if not, I think that you are looking for something like:
libname monthly "c:\art\monthly";
data have;
input ID open_date date9.;
cards;
123 01may2004
124 16aug2010
244 20jan2009
;
data monthly.may2005;
input ID (open_date current_date) (date9.) x y z;
cards;
123 01may200407may2005 1 2 3
;
data monthly.aug2011;
input ID (open_date current_date) (date9.) x y z;
cards;
124 16aug201012aug2011 4 5 6
;
data monthly.jan2010;
input ID (open_date current_date) (date9.) x y z;
cards;
244 20jan200918jan2010 7 8 9
;
proc sql noprint;
select "monthly."||memname
into :monthly_files
separated by " "
from dictionary.tables
where libname="MONTHLY"
;
quit;
data all_monthly (drop=open_date);
set &monthly_files. INDSNAME=dsn;
fdate=input(scan(dsn,2,"."),monyy7.);
run;
data have;
set have;
fdate=input(put(intnx('year',open_date,1,'s'),monyy7.),monyy7.);
run;
proc sort data=have;
by id fdate;
run;
proc sort data=all_monthly;
by id fdate;
run;
data want;
merge have (in=ina) all_monthly (in=inb);
by id fdate;
if ina;
run;
Sorry for the late reply I finally found what the problem was, 'bs' is telling sas to read from right to left and I wanted 2nd column from left to right, found it by try and error. Below is the code for reading the correct column (2nd column from the left) But I have no idea what to use if I wanted to read from left to right that's why I've used -32 , I just counted the column.
Thanks
filename cc "C:\CDM1\5K\5K_5000obs\5K_5000obs_RIT\IN\dat1.dat";
Data class2_1;
infile cc lrecl=10000 filename=filename;
input;
file2_2=input(scan(_infile_,-32,,'bs'),best12.);
run;
R.A.G,
You posted your message in response to the wrong thread, but I'll try to answer it anyway since I probably confused the earlier message.
As I recall you wanted to read the second column. You really only needed to use:
file2_2=input(scan(_infile_,2,,'s'),best12.);
Art
thanks
RG
Hi Art,
Thanks for your replies. I have about 10 years worth of data split up
into monthly SAS data files. The files are labelled yyyymm and have
about 1million IDs (rows) and about 125 variables (col) in each
dataset.
I would like to find out information on 80000 IDs which have an open
date variable and is held in a SAS dataset. I want to pull the same
variables for each ID from the monthly tables corresponding to the
table which holds data 1 year on from the open date. The IDs have
different open dates over the 10year period.
The matching would be 1 to 1 on the IDs. I want a snapshot of how the
ID is doing after 1 year.
If I join all the monthly files together, won't this be quite
inefficient? 10x12x1million or 10x12x80000 if I only keep those IDs
I'm interested in.
Many thanks,
Bo
Define efficient! Yes, it may be using a sledge hammer to drive in a nail, but a very accurate sledge hammer. But, you are talking about less than 10 million records, which will only take up seconds of cpu time.
The alternative is to develop complex macros that can easily take hours/days to write and add in numerous instances where you could error by writing something that looks like it will work but does something totally unexpected. Since I recall you saying that you are a SAS newbee, I'd strongly recommend NOT going there too early in your learning curve.
Regarding processing, if you are not going to use all 125 variables, dropping as many as possible, as early in the process as you can, will drastically improve efficiency.
It looks like a query problem. En... Hash Table or proc format will be a good choice.
Finding in a dataset once, then use Macro to run across every dataset.
Ksharp
Thanks all. I finally managed to do this with a looped proc SQL and where statement.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.