BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Vishy
Obsidian | Level 7

Hi,

 

Is there any logic to get the previous month's value when a condition matches using two different data sets? Please see below data sets sample.

 

Dataset1:

Var_AVar_B
101Feb18
215Mar18
313Mar18
421Apr18
511Jul18
621Apr18
717May18

 

Dataset2:

Var_AVar_CInt Rate
101Jan185.20%
101Feb186.20%
215Dec171.50%
215Jan182.50%
215Feb183.50%
215Mar184.50%
320Mar185.50%
320Apr186.50%
320May187.50%
410Apr182.60%
410May183.60%
511Jun188.80%
511Jul189.80%
621Apr185.70%
730Oct184.60%
730Nov186.60%

 

Desired Output:

Var_AVar_BInt Rate
101Feb185.20%
215Mar183.50%
313Mar18 
421Apr18 
511Jul188.80%
621Apr185.70%
717May18 
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Why is the case for var_a=6 yielding int_rate=5.70% in you desired dataset?  Both dataset1 and datasest2 have only one date (21apr18) for var_a=6.  So it has no previous month, which by your specification means resulting int_rate should be missing, no?

 

 Assuming my speculation is correct, then this code produces what you want (edited revision):

 

data dataset1;
  input Var_A	Var_B :date7.;
  format var_b date9.;
datalines;
1	01Feb18
2	15Mar18
3	13Mar18
4	21Apr18
5	11Jul18
6	21Apr18
7	17May18
run;
 
data Dataset2;
  input Var_A	Var_C :date7.	Int_Rate  :percent5.2 ;
  format var_c date9.  int_rate percent7.2 ;
datalines;
1	01Jan18	5.20%
1	01Feb18	6.20%
2	15Dec17	1.50%
2	15Jan18	2.50%
2	15Feb18	3.50%
2	15Mar18	4.50%
3	20Mar18	5.50%
3	20Apr18	6.50%
3	20May18	7.50%
4	10Apr18	2.60%
4	10May18	3.60%
5	11Jun18	8.80%
5	11Jul18	9.80%
6	21Apr18	5.70%
7	30Oct18	4.60%
7	30Nov18	6.60%
run;

data want (keep=var_a var_b int_rate);
  merge dataset1 (in=in1) 
        dataset2 (in=in2 keep=var_a var_c rename=(var_c=var_b));
  by var_a var_b;
  if in2 then do;
     set dataset2 (keep=int_rate);
     int_rate=lag(int_rate);
  end;
  if first.var_a then int_rate=.;
  if in1;
run;

 

This code uses a conditional SET statement with a MERGE statement, which is a good technique for doing LOCF (last observation carried forward) for multiple datasets representing irregular time series.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep.  As such this is untested:

data want;
  set dataset1 (in=a) dataset2 (in=b rename=(intrate=irate));
  ds=ifn(a,1,2);
run;
proc sort data=want;
  by var_a var_b;
run;
data want;
  if ds=1 then do;
    intrate=lag(irate);
    output;
  end;
run;
Vishy
Obsidian | Level 7

@RW9,

 

Thanks for looking into this. what is ds=ifn(a,1,2); below? It's not working.

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well it should resolve to if row comes from table aliased as a, then set to 1 else, set to 2.  Maybe the ifn function doesn't like aliases, 

data want;
  set dataset1 (in=a) dataset2 (in=b rename=(intrate=irate));
  if a then ds=1;
else ds=2; run; proc sort data=want; by var_a var_b; run; data want; if ds=1 then do; intrate=lag(irate); output; end; run;

Not tested still. 

mkeintz
PROC Star

Why is the case for var_a=6 yielding int_rate=5.70% in you desired dataset?  Both dataset1 and datasest2 have only one date (21apr18) for var_a=6.  So it has no previous month, which by your specification means resulting int_rate should be missing, no?

 

 Assuming my speculation is correct, then this code produces what you want (edited revision):

 

data dataset1;
  input Var_A	Var_B :date7.;
  format var_b date9.;
datalines;
1	01Feb18
2	15Mar18
3	13Mar18
4	21Apr18
5	11Jul18
6	21Apr18
7	17May18
run;
 
data Dataset2;
  input Var_A	Var_C :date7.	Int_Rate  :percent5.2 ;
  format var_c date9.  int_rate percent7.2 ;
datalines;
1	01Jan18	5.20%
1	01Feb18	6.20%
2	15Dec17	1.50%
2	15Jan18	2.50%
2	15Feb18	3.50%
2	15Mar18	4.50%
3	20Mar18	5.50%
3	20Apr18	6.50%
3	20May18	7.50%
4	10Apr18	2.60%
4	10May18	3.60%
5	11Jun18	8.80%
5	11Jul18	9.80%
6	21Apr18	5.70%
7	30Oct18	4.60%
7	30Nov18	6.60%
run;

data want (keep=var_a var_b int_rate);
  merge dataset1 (in=in1) 
        dataset2 (in=in2 keep=var_a var_c rename=(var_c=var_b));
  by var_a var_b;
  if in2 then do;
     set dataset2 (keep=int_rate);
     int_rate=lag(int_rate);
  end;
  if first.var_a then int_rate=.;
  if in1;
run;

 

This code uses a conditional SET statement with a MERGE statement, which is a good technique for doing LOCF (last observation carried forward) for multiple datasets representing irregular time series.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
s_lassen
Meteorite | Level 14

Here is a possible solution:

data data1;

  input Var_A / Var_B date.;

  format Var_B date9.;

cards;

1

01Feb18

2

15Mar18

3

13Mar18

4

21Apr18

5

11Jul18

6

21Apr18

7

17May18

;run;

 

Data data2;

  input Var_A / Var_C date. / Int_Rate percent.;

  format Var_C date9. Int_Rate percent7.2;

cards;

1

01Jan18

5.20%

1

01Feb18

6.20%

2

15Dec17

1.50%

2

15Jan18

2.50%

2

15Feb18

3.50%

2

15Mar18

4.50%

3

20Mar18

5.50%

3

20Apr18

6.50%

3

20May18

7.50%

4

10Apr18

2.60%

4

10May18

3.60%

5

11Jun18

8.80%

5

11Jul18

9.80%

6

21Apr18

5.70%

7

30Oct18

4.60%

7

30Nov18

6.60%

;run;

 

proc sql;

  create index idx on data2(Var_A,Var_C);

quit;

data want;

  set data1;

  Var_C=intnx('month',Var_B,-1,'S');

  set data2 key=idx/unique;

  if _iorc_ then do;

    Int_Rate=.;

    _error_=0;

    end;

  drop Var_C;

run;

 

The program gives the same result as you indicated, except for Var_A=6 and Var_B=21APR18, where the data in the second table is not from the previous month, but from the same month (so Int_rate is set missing).

SuryaKiran
Meteorite | Level 14

Although you got good solution here is another way using Left join with condition.

 

proc sql;
create table want as 
select a.Var_A,a.Var_B,b.Int_Rate
	from dataset1 a
left join dataset2 b
	on (intnx('month',a.Var_B,0,'b')=intnx('month',b.Var_C,1,'b') and a.Var_A=b.Var_A)
;
quit;
Thanks,
Suryakiran
Ksharp
Super User
data dataset1;
infile cards expandtabs;
  input Var_A	Var_B :date7.;
  format var_b date9.;
datalines;
1	01Feb18
2	15Mar18
3	13Mar18
4	21Apr18
5	11Jul18
6	21Apr18
7	17May18
run;
 
data Dataset2;
infile cards expandtabs;
  input Var_A	Var_C :date7.	Int_Rate  :percent5.2 ;
  format var_c date9.  int_rate percent7.2 ;
datalines;
1	01Jan18	5.20%
1	01Feb18	6.20%
2	15Dec17	1.50%
2	15Jan18	2.50%
2	15Feb18	3.50%
2	15Mar18	4.50%
3	20Mar18	5.50%
3	20Apr18	6.50%
3	20May18	7.50%
4	10Apr18	2.60%
4	10May18	3.60%
5	11Jun18	8.80%
5	11Jul18	9.80%
6	21Apr18	5.70%
7	30Oct18	4.60%
7	30Nov18	6.60%
run;
data temp;
 set dataset1;
 var_c=intnx('month',Var_B,-1,'s');
run;
data want;
 merge temp(in=ina) dataset2;
 by  Var_A	Var_C;
 if ina;
run;
proc print noobs;run;
Vishy
Obsidian | Level 7

Thank you all for your inputs.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2368 views
  • 0 likes
  • 6 in conversation