Desktop productivity for business analysts and programmers

how to create a 3rd table from other two, conditioned to variable matching

Reply
Frequent Contributor
Posts: 87

how to create a 3rd table from other two, conditioned to variable matching

This is my dataset 1 (t1):

 

data sample1;
  infile cards truncover expandtabs;
  input MACH $ EVENT $ DATE :date. TIME :time. PMACH $ PXMACH $ SITE $ RAD MILVAL SF;
  format date date8. time HHMM.;
cards;
US0001 Lat 02JAN13 19:24 . . . 876 2660403.00000 1
US0001 Cre 29MAY13 0:00 7611 I1124P 1 . 2734440.00000 1
US0001 Cre 31JAN14 0:00 7402 I1015P 2 . 2735017.00000 1
US0001 Lat 12JAN15 7:00 . . . . 2900334.00000 1
US000323 Lat 13OCT12 19:37 . . . 852.2 1332753.00000 1
US000323 WI 25OCT12 0:00 . . . . 1342148.00000 1
US000323 Rem 31OCT12 0:00 7416 I1079P 3 . 1346049.00000 1
US000323 Lat 31OCT12 14:03 . . . 890.5 1346049.00000 1
US000323 Inst 11JAN13 0:00 7408 I1034P 3 . 1346049.00000 1
US000323 Lat 16.marras.13 19:52 . . . 888.7 1417443.00000 1
US000323 Lat 12OCT13 13:49 . . . 886.7 1606899.00000 1
US000323 Lat 12OCT13 14:17 . . . 886.7 1606899.00000 1
US000323 Rem 24MAY14 0:00 7408 I1034P 3 . 1812537.00000 1
US000328 Lat 31JAN13 23:41    853.2 2040610.00000 1
US000328 Cre 03FEB13 0:00 7209 I1115P 3 . 2040610.00000 1
US000328 Lat 17MAY13 12:43 . . . 847.2 2134438.00000 1
US000328 Lat 17MAY13 12:46 . . . 847.2 2134438.00000 1
US000328 Lat 02SEP13 15:03 . . . 846.3 2134438.00000 1
US000328 Cre 18JUL14 0:00 7218 I1152P 3 . 2134438.00000 1
US000328 Lat 25SEP14 8:54 . . . 842.4 2180863.00000 1
US000328 Inst 15FEB15 0:00 7508 I1098P 3 . 2180863.00000 1
US000328 Cre 21MAY15 0:00 7212 I1098P 3 . 2232830.00000 1
US000328 WI 24JUN15 0:00 . . . . 2232830.00000 1
US000328 Lat 13JUL15 0:00 . . . 890.2 2232830.00000 1
;

And this is my dataset 2 (t2):

 

data sample2;
infile cards truncover expandtabs;
input Mach $ Date : date9. Data $;
format date : date9.;
cards;
US0001 28Dec2012 dassssd
US0001 . 322234
US0001 30Dec2012 sadasd
US0001 . 1213
US0001 01Jan2013 asdsds
US0001 02Jan2013 zxcxzc
US0001 03Jan2013 asdad
US0001 04Jan2013 zxczc
US0001 . zxcc
US0001 06Jan2013 zxc
US0001 07Jan2013 czxcxzc
US0001 24May2013 zxcwef
US0001 . x
US0001 26May2013 dassssd
US0001 . 322234
US0001 28May2013 sadasd
US0001 29May2013 1213
US0001 30May2013 asdsds
US0001 . zxcxzc
US0001 01Jun2013 asdad
US0001 02Jun2013 zxczc
US0001 03Jun2013 zxcc
US0001 . zxc
US0001 27Jan2014 czxcxzc
US0001 . zxcwef
US0001 29Jan2014 x
US0001 30Jan2014 dassssd
US0001 31Jan2014 322234
US0001 01Feb2014 sadasd
US0001 02Feb2014 1213
US0001 . asdsds
US0001 04Feb2014 zxcxzc
US0001 05Feb2014 asdad
US0001 07Jan2015 zxczc
US0001 . zxcc
US0001 09Jan2015 zxc
US0001 10Jan2015 czxcxzc
US0001 11Jan2015 zxcwef
US0001 12Jan2015 x
US0001 . dassssd
US0001 14Jan2015 322234
US0001 15Jan2015 sadasd
US0001 . 1213
US0001 17Jan2015 asdsds
US000323 08Oct2012 zxcxzc
US000323 09Oct2012 asdad
US000323 10Oct2012 zxczc
US000323 . zxcc
US000323 12Oct2012 zxc
US000323 13Oct2012 czxcxzc
US000323 . zxcwef
US000323 15Oct2012 x
US000323 16Oct2012 dassssd
US000323 17Oct2012 322234
US000323 18Oct2012 sadasd
US000323 . 1213
US000323 21Oct2012 asdsds
US000323 22Oct2012 zxcxzc
US000323 . asdad
US000323 24Oct2012 zxczc
US000323 25Oct2012 zxcc
US000323 26Oct2012 zxc
US000323 27Oct2012 czxcxzc
US000323 28Oct2012 zxcwef
US000323 29Oct2012 x
US000323 30Oct2012 dassssd
US000323 26Oct2012 322234
US000323 . sadasd
US000323 28Oct2012 1213
US000323 29Oct2012 asdsds
US000323 30Oct2012 zxcxzc
US000323 . asdad
US000323 01Nov2012 zxczc
US000323 02Nov2012 zxcc
US000323 03Nov2012 zxc
US000323 04Nov2012 czxcxzc
US000323 05Nov2012 zxcwef
US000323 26Oct2012 x
US000323 . dassssd
US000323 28Oct2012 322234
US000323 29Oct2012 sadasd
US000323 30Oct2012 1213
US000323 31Oct2012 asdsds
US000323 01Nov2012 zxcxzc
US000323 02Nov2012 asdad
US000323 03Nov2012 zxczc
US000323 04Nov2012 zxcc
US000323 . zxc
US000323 06Jan2013 czxcxzc
US000323 07Jan2013 zxcwef
US000323 08Jan2013 x
US000323 09Jan2013 dassssd
US000323 . 322234
US000323 11Jan2013 sadasd
US000323 12Jan2013 1213
US000323 13Jan2013 asdsds
US000323 14Jan2013 zxcxzc
US000323 15Jan2013 asdad
US000323 16Jan2013 zxczc
US000323 11Mar2013 zxcc
US000323 12Mar2013 zxc
US000323 13Mar2013 czxcxzc
US000323 14Mar2013 zxcwef
US000323 15Mar2013 x
US000323 16Mar2013 dassssd
US000323 17Mar2013 322234
US000323 18Mar2013 sadasd
US000323 19Mar2013 1213
US000323 20Mar2013 asdsds
US000323 21Mar2013 zxcxzc
US000323 07Oct2013 asdad
US000323 08Oct2013 zxczc
US000323 09Oct2013 zxcc
US000323 10Oct2013 zxc
US000323 11Oct2013 czxcxzc
US000323 12Oct2013 zxcwef
US000323 13Oct2013 x
US000323 14Oct2013 dassssd
US000323 15Oct2013 322234
US000323 16Oct2013 sadasd
US000323 17Oct2013 1213
US000323 07Oct2013 asdsds
US000323 08Oct2013 zxcxzc
US000323 09Oct2013 asdad
US000323 10Oct2013 zxczc
US000323 11Oct2013 zxcc
US000323 12Oct2013 zxc
US000323 13Oct2013 czxcxzc
US000323 14Oct2013 zxcwef
US000323 15Oct2013 x
US000323 16Oct2013 dassssd
US000323 17Oct2013 322234
US000323 19May2014 sadasd
US000323 20May2014 1213
US000323 21May2014 asdsds
US000323 22May2014 zxcxzc
US000323 23May2014 asdad
US000323 24May2014 zxczc
US000323 25May2014 zxcc
US000323 26May2014 zxc
US000323 27May2014 czxcxzc
US000323 28May2014 zxcwef
US000323 29May2014 x
US000328 26Jan2013 dassssd
US000328 27Jan2013 322234
US000328 28Jan2013 sadasd
US000328 . 1213
US000328 30Jan2013 asdsds
US000328 31Jan2013 zxcxzc
US000328 01Feb2013 asdad
US000328 02Feb2013 zxczc
US000328 . zxcc
US000328 04Feb2013 zxc
US000328 05Feb2013 czxcxzc
US000328 29Jan2013 zxcwef
US000328 30Jan2013 x
US000328 . dassssd
US000328 01Feb2013 322234
US000328 02Feb2013 sadasd
US000328 03Feb2013 1213
US000328 04Feb2013 asdsds
US000328 . zxcxzc
US000328 06Feb2013 asdad
US000328 07Feb2013 zxczc
US000328 08Feb2013 zxcc
US000328 . zxc
US000328 13May2013 czxcxzc
US000328 14May2013 zxcwef
US000328 15May2013 x
US000328 16May2013 dassssd
US000328 . 322234
US000328 18May2013 sadasd
US000328 19May2013 1213
US000328 20May2013 asdsds
US000328 21May2013 zxcxzc
US000328 . asdad
US000328 12May2013 zxczc
US000328 13May2013 zxcc
US000328 14May2013 zxc
US000328 . czxcxzc
US000328 16May2013 zxcwef
US000328 17May2013 x
US000328 18May2013 dassssd
US000328 19May2013 322234
US000328 20May2013 sadasd
US000328 21May2013 1213
US000328 22May2013 asdsds
US000328 28Aug2013 zxcxzc
US000328 29Aug2013 asdad
US000328 . zxczc
US000328 31Aug2013 zxcc
US000328 01Sep2013 zxc
US000328 02Sep2013 czxcxzc
US000328 03Sep2013 zxcwef
US000328 04Sep2013 x
US000328 . dassssd
US000328 06Sep2013 322234
US000328 07Sep2013 sadasd
US000328 13Jul2014 1213
US000328 14Jul2014 asdsds
US000328 15Jul2014 zxcxzc
US000328 16Jul2014 asdad
US000328 17Jul2014 zxczc
US000328 18Jul2014 zxcc
US000328 19Jul2014 zxc
US000328 20Jul2014 czxcxzc
US000328 21Jul2014 zxcwef
US000328 . sdsfsfqeww
US000328 23Jul2014 zxcc
US000328 20Sep2014 zxc
US000328 21Sep2014 czxcxzc
US000328 22Sep2014 zxcwef
US000328 23Sep2014 x
US000328 24Sep2014 dassssd
US000328 . 322234
US000328 26Sep2014 sadasd
US000328 27Sep2014 1213
US000328 28Sep2014 asdsds
US000328 29Sep2014 zxcxzc
US000328 30Sep2014 asdad
US000328 . zxczc
US000328 11Feb2015 zxcc
US000328 12Feb2015 zxc
US000328 . czxcxzc
US000328 14Feb2015 zxcwef
US000328 15Feb2015 x
US000328 16Feb2015 dassssd
US000328 17Feb2015 322234
US000328 18Feb2015 sadasd
US000328 19Feb2015 1213
US000328 . asdsds
US000328 16May2015 zxcxzc
US000328 17May2015 asdad
US000328 18May2015 zxczc
US000328 19May2015 zxcc
US000328 20May2015 zxc
US000328 21May2015 czxcxzc
US000328 22May2015 zxcwef
US000328 . x
US000328 24May2015 dassssd
US000328 25May2015 322234
US000328 26May2015 sadasd
US000328 19Jun2015 1213
US000328 20Jun2015 asdsds
US000328 21Jun2015 zxcxzc
US000328 22Jun2015 asdad
US000328 23Jun2015 zxczc
US000328 24Jun2015 zxcc
US000328 25Jun2015 zxc
US000328 26Jun2015 czxcxzc
US000328 . zxcwef
US000328 28Jun2015 x
US000328 29Jun2015 dassssd
US000328 08Jul2015 322234
US000328 . sadasd
US000328 10Jul2015 1213
US000328 11Jul2015 asdsds
US000328 12Jul2015 zxcxzc
US000328 . asdad
US000328 14Jul2015 zxczc
US000328 15Jul2015 zxcc
US000328 16Jul2015 zxc
US000328 17Jul2015 czxcxzc
US000328 . zxcwef
;

What I want is:

 

  1. Where   t1.Event = "Lat"    and     t1.Mach = t2.Mach    and      t1.Date = t2.Date
  2. Get the  Dates from t2 which are immediate previous and immediate after (when t1.Date = t2.Date as in above point 1)
  3. Combine t1 and t2 into t3 such that the result looks like something as in below:

 

MACHEVENTDATETIMEPMACHPXMACHSITERADMILVALSFDATA
US0001 01Jan2013       asdsds
US0001Lat02Jan201319:24   87626604031 
US0001 03Jan2013       asdad
US0001 07Jan2013       czxcxzc
US0001Lat12Jan20157:00    29003341 
US0001 24May2013       zxcwef
US000323 12Oct2012       zxc
US000323Lat13Oct201219:37   852.213327531 
US000323 15Oct2012       x

 

NOTE:  for US000323 after 13Oct2012 in t1, the selected date from t2 is 15Oct2012, which is 2nd next date in t2 because immediately after t1.Date=t2.Date (=13Oct2016) in t2 is a BLANK date, so we do not select that one but select the 2nd next date which is 15Oct2016.

 

Super Contributor
Posts: 251

Re: how to create a 3rd table from other two, conditioned to variable matching

I've been having a little trouble with this, because I think you may have made some typos in your examples (there are no dates in 2016, for example!). Also, there is a problem with the sample2 dataset because it doesn't contain time values - there are a couple of instances where there are multiple Lat records for a Mach record for the same date but different times (qv US000323 / 12Oct2013).

 

Note that I don't have EG, and I'm not great at using EG's query generator anyway. This is cobbled together in Studio.

 

Nevertheless this is the code I came up with:

data sample1;
  infile cards truncover expandtabs;
  input MACH $ EVENT $ DATE ?? :date. TIME ?? :time. PMACH $ PXMACH $ SITE $ RAD MILVAL SF;
  format date date8. time HHMM.;
cards;
…
run;

data sample2;
infile cards truncover expandtabs;
input Mach $ Date ?? : date9. Data $;
format date : date9.;
if not missing(date);          /* Get rid of missing dates now - makes following code easier */
cards;
…
run;

proc sort data=sample1 noequals;
by mach date time;
run;

proc sort data=sample2 noequals;
by mach date;
run;

data sample2;
set sample2;
id = monotonic();   /* Because dates aren't continuous, create a continuous numeric id */
run;

proc sql magic=103;
/*
   Get the sample1 Lat observations, along with the ID value on 
   sample2 where there's a match.
*/
create table result as
   select t1.mach,
          t1.event,
          t1.date,
          t2.id,
          t1.time,
          t1.pmach,
          t1.pxmach,
          t1.site,
          t1.rad,
          t1.milval,
          t1.sf
     from sample1 as t1
    inner join sample2 as t2
       on t1.mach = t2.mach
      and t1.date = t2.date
    where t1.event = 'Lat'
      and t1.date is not null
    order by t1.mach,
             t1.date;
/*
   Join against sample2 again for the *previous* id. Do this as two passes, because 
   the optimiser does a better job. Repeat for the next id.
*/
create table result_previous as
   select t1.mach,
          t2.date,
          t2.data
     from result as t1
      left join sample2 as t2
       on t1.mach = t2.mach
      and t1.id - 1 = t2.id
    where t2.date is not null
    order by t1.mach,
             t2.date;
create table result_next as
   select t1.mach,
          t2.date,
          t2.data
     from result as t1
      left join sample2 as t2
       on t1.mach = t2.mach
      and t1.id + 1 = t2.id
    where t2.date is not null
    order by t1.mach,
             t2.date;
quit;

data final_result;
/*
   Interleave the three datasets together by mach and date. 
*So* much easier, and faster, than doing a union join. */ retain mach event date time pmach pxmach site rad milval sf data; set result_previous result(drop=id) result_next; by mach date; run;

It will need work on your part to clean up the data quality issues.

Frequent Contributor
Posts: 87

Re: how to create a 3rd table from other two, conditioned to variable matching

Hi Thanks for your help. I really appreciate it.

 

I just want to learn if this process can be done on part by part data?

 

I have huge datasets, and it is taking many hours to run the process. and in the end if there is some error (e.g. typos etc.), all the waiting time goes for the waste Smiley Sad

 

Something like.. First 1 million rows in table 2.

Frequent Contributor
Posts: 87

Re: how to create a 3rd table from other two, conditioned to variable matching

I got a doubt here... where is 't1.id' in the above.
Super Contributor
Posts: 251

Re: how to create a 3rd table from other two, conditioned to variable matching

The id variable is populated in the modification to sample2:

data sample2;
set sample2;
id = monotonic();
run;

The monotonic function creates an incremented value for a variable. in this context it starts from one. This allows for investigating the prior and post-observations by looking at current observation +/- 1.

 

As for the process taking a long time, you might like to look at putting compress on the datasets as you create them:

data sample1(compress=yes);
…
run;

This won't necessarily make a difference, but it's a good place to start.

 

Also for small numeric variables, changing the length will reduce i/o. A numeric variable which will never go over 64k can be safely stored with a length of 4, for example (all date variables!) Also restrict the length of character variables can make a big difference. Mach can be left at its default of 8, but Event only needs to be 3. Over a million observations, you've saved 9 megabytes just with those two.

Ask a Question
Discussion stats
  • 4 replies
  • 210 views
  • 2 likes
  • 2 in conversation