BookmarkSubscribeRSS Feed
bolore
Calcite | Level 5

Hi

I'm a SAS programming beginner!

I've a got a dataset with a list of ID numbers and dates. I want to

open a different dataset corresponding to the month and year of each ID

opening date + one year on and get some data for each ID.

For example (dataset1)

ID    open_date

123  01may2004

124  16aug2010

244  20jan2009

In the end I I want to create a dataset (from a different data set with further additional data) for each of

the ID that has a corresponding open_date + 1 year.

ID      open_date      current_date  additional_data (from dataset2)

123    01may2004    01may2005   1

124    16aug2010     16aug2011    0

244    20jan2009      20jan2010     0

I guess this is similar to a lookup table in excel, but a little stuck in SAS - Hope you can help. 

11 REPLIES 11
art297
Opal | Level 21

Are they both already SAS datasets?  Does the date have to be precisely 1 year apart?  What happens if the initial date is February 29th of a given year?

If they are both SAS datasets, and dependent upon the answers for the above questions, then you can merge the two files in either a datastep or proc sql, and control the other factors.

Whether a datastep will be sufficient will be a function of what kind of match there is between the two files, i.e. one to one, one to many, or many to many.

bolore
Calcite | Level 5

Yes, they are SAS datasets.

I want to chose the second dataset from lots of possible datasets depending on the ID open date e.g. if opendate is 1aug2010 then open dataset aug2010.  The datasets are monthly datasets that I need the data from so single day precision is not required, just month.

art297
Opal | Level 21

You didn't answer regarding the type of matching that would exist.  If it is only one to one, and since you are new to SAS, depending upon the number of records involved (e.g., if it is less than 10 million or so) I, personally, would combine all of the monthly files into one dataset (which you can probably do in a datastep using the filename statement .. possibly with a wild card), sort the resulting and main datasets by ID and month, and then merge then together in a datastep.

If you do it with a datastep, you will have to first insure that the two sets of files (let's call them have and wanttomerge) only have one variable that has the same name, namely id.  If they do share any other variable names, you can always use the datastep rename option to change the name(s) in the wanttomerge dataset.

Without knowing more about the files one can't write the code for you, but it would probably look something like:

data want;

  merge have (in=ina) wanttomerge (in=inb);

  by id;

  if ina and month(date) eq month(datefromwanttomerge) and

   year(date)+1 eq year(datefromwanttomerge);

run;

Code like that would select all of the records from have, and combine them with records from wanttomerge, where the records exist in wanttomerge that share the same month but are one year later than those in have.

HTH,

Art

art297
Opal | Level 21

You never responded to my last post, thus the following may be totally irrelevant.  However, if not, I think that you are looking for something like:

libname monthly "c:\art\monthly";

data have;

  input ID    open_date date9.;

  cards;

123 01may2004

124 16aug2010

244 20jan2009

;

data monthly.may2005;

  input ID (open_date current_date) (date9.) x y z;

  cards;

123 01may200407may2005 1 2 3

;

data monthly.aug2011;

  input ID (open_date current_date) (date9.) x y z;

  cards;

124 16aug201012aug2011 4 5 6

;

data monthly.jan2010;

  input ID (open_date current_date) (date9.) x y z;

  cards;

244 20jan200918jan2010 7 8 9

;

proc sql noprint;

  select "monthly."||memname

    into :monthly_files

      separated by " "

        from dictionary.tables

          where libname="MONTHLY"

  ;

quit;

data all_monthly (drop=open_date);

  set &monthly_files. INDSNAME=dsn;

  fdate=input(scan(dsn,2,"."),monyy7.);

run;

data have;

  set have;

  fdate=input(put(intnx('year',open_date,1,'s'),monyy7.),monyy7.);

run;

 

proc sort data=have;

  by id fdate;

run;

proc sort data=all_monthly;

  by id fdate;

run;

 

data want;

  merge have (in=ina) all_monthly (in=inb);

  by id fdate;

  if ina;

run;

R_A_G_
Calcite | Level 5

Sorry for the late reply I finally found what the problem was, 'bs' is telling sas to read from right to left and I wanted 2nd column from left to right, found it by try and error. Below is the code for reading the correct column (2nd column from the left) But I have no idea what to use if I wanted to read from left to right that's why I've used -32 , I just counted the column.

Thanks

filename cc "C:\CDM1\5K\5K_5000obs\5K_5000obs_RIT\IN\dat1.dat";

Data class2_1;

infile cc lrecl=10000 filename=filename;

input;

file2_2=input(scan(_infile_,-32,,'bs'),best12.);

run;

art297
Opal | Level 21

R.A.G,

You posted your message in response to the wrong thread, but I'll try to answer it anyway since I probably confused the earlier message.

As I recall you wanted to read the second column.  You really only needed to use:

file2_2=input(scan(_infile_,2,,'s'),best12.);

Art

R_A_G_
Calcite | Level 5

thanks

RG

bolore
Calcite | Level 5

Hi Art,

Thanks for your replies. I have about 10 years worth of data split up

into monthly SAS data files. The files are labelled yyyymm and have

about 1million IDs (rows) and about 125 variables (col) in each

dataset.

I would like to find out information on 80000 IDs which have an open

date variable and is held in a SAS dataset. I want to pull the same

variables for each ID from the monthly tables corresponding to the

table which holds data 1 year on from the open date. The IDs have

different open dates over the 10year period.

The matching would be 1 to 1 on the IDs. I want a snapshot of how the

ID is doing after 1 year.

If I join all the monthly files together, won't this be quite

inefficient? 10x12x1million or 10x12x80000 if I only keep those IDs

I'm interested in.

Many thanks,

Bo

art297
Opal | Level 21

Define efficient!  Yes, it may be using a sledge hammer to drive in a nail, but a very accurate sledge hammer. But, you are talking about less than 10 million records, which will only take up seconds of cpu time.

The alternative is to develop complex macros that can easily take hours/days to write and add in numerous instances where you could error by writing something that looks like it will work but does something totally unexpected.  Since I recall you saying that you are a SAS newbee, I'd strongly recommend NOT going there too early in your learning curve.

Regarding processing, if you are not going to use all 125 variables, dropping as many as possible, as early in the process as you can, will drastically improve efficiency.

Ksharp
Super User

It looks like a query problem. En... Hash Table or proc format will be a good choice.

Finding in a dataset once, then use Macro to run across every dataset.

Ksharp

bolore
Calcite | Level 5

Thanks all.  I finally managed to do this with a looped proc SQL and where statement.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2302 views
  • 0 likes
  • 4 in conversation