DATA Step, Macro, Functions and more

subset a dataset based on variable obs of another dataset

Reply
Contributor
Posts: 23

subset a dataset based on variable obs of another dataset

I have 3 datasets.

 

1. Based on 3 variables (ID, date and company) in dataset 1, I want to subset a dataset 2 and create new dataset (Char) containing matched observations of common variables

 

2. Based on 2 variables (ID and date) in dataset 1, I want to subset dataset 3 following a logic that if date in dataset 1 falls at or in between the begin date and end date, then keep that row in a new dataset (Act)

Dataset 1     
IDdateCompanyTypevariable3variable4
1Jan-10ax  
1May-10ay  
2Apr-11ax  
3Dec-11ax  
3Feb-12by  

 

Dataset 2    
IDdateCompanyvariable1variable2
1Jan-10a  
1Feb-10a  
1Mar-10a  
1Apr-10a  
1May-10a  
1Jun-10a  
1Jul-10a  
1Aug-10a  
1Sep-10a  
1Oct-10a  
1Nov-10a  
1Dec-10a  
2Mar-11a  
2Apr-11a  
2May-11a  
2Jun-11a  
2Jul-11a  
2Aug-11a  
2Sep-11a  
3Feb-10a  
3Mar-10a  
3Apr-10a  
3May-10a  
3Jun-10a  
3Jul-10a  
3Aug-10a  
3Sep-10a  
3Oct-10a  
3Nov-10a  
3Dec-10a  
3Jan-11a  
3Feb-11a  
3Mar-11a  
3Apr-11a  
3May-11a  
3Jun-11a  
3Jul-11a  
3Aug-11a  
3Sep-11a  
3Oct-11a  
3Nov-11a  
3Dec-11a  
3Jan-12b  
3Feb-12b  
3Mar-12b  

 

Dataset 3     
IDStatusBegin dateEnd datevariable5variable6
1cJan-10Apr-10  
1wMay-10Nov-10  
1tDec-10   
2tMar-11May-11  
2cJun-11Sep-11  
3tFeb-10Dec-11  
3wJan-12   

 

 

New dataset-Char

IDdateCompanyvariable1variable2
1Jan-10a  
1May-10a  
2Apr-11a  
3Dec-11a  
3Feb-12b  

 

New dataset-Act

IDStatusBegin dateEnd datevariable5variable6
1cJan-10Apr-10  
1wMay-10Nov-10  
2tMar-11May-11  
3tFeb-10Dec-11  
3wJan-12   

Any suggestion?

 

Thanks.

PROC Star
Posts: 1,571

Re: subset a dataset based on variable obs of another dataset

[ Edited ]

Hi  @aled Have you tried any look up methods like hash, join, merge etc? 

 

It seems very straight forward while I am too lazy to type. There are tons of examples you will find on a google search and all you need to do is copy paste

 

Contributor
Posts: 23

Re: subset a dataset based on variable obs of another dataset

Posted in reply to novinosrin
I have done simple merges. But don't know hash or join methods. Can you point me to resource for this method?






PROC Star
Posts: 1,571

Re: subset a dataset based on variable obs of another dataset

@aled No worries. I don't think your dates are in this monyy format. 

10-Feb
10-Mar
10-Apr
10-Feb
10-Mar
10-Apr

 

So, do a favor Please provide a better representative sample with real dates to effectively test the condition

between the begin date and end date  

 

If you could, I will write the code for you.

Contributor
Posts: 23

Re: subset a dataset based on variable obs of another dataset

Posted in reply to novinosrin

dates are in month/year format. Let me provide you better sample.

 

I did try the following to create new dataset Char. I think my date variable in dataset 1 is being read as number and in dataset 2 as character. I am importing excel file and I cant get to read it as both dates.

 

proc sql noprint;

create table Char as

select distinct a.*

from dataset1 a, dataset2 b

where a.ID = b.ID

and a.date = b.date

;

quit;

 

Contributor
Posts: 23

Re: subset a dataset based on variable obs of another dataset

Posted in reply to novinosrin
dataset 1     
IDdateCompanyC DateR DateRR DateType
12017-06AW6/5/20175/16/20177/11/2017IVA
12017-10MS10/10/201710/3/201710/17/2017GA
22016-02MS2/25/20162/4/20163/7/2016PA
22017-07MS7/20/20177/11/20177/25/2017GA
32017-03AW3/22/20173/5/20175/3/2017IVA
42017-09RS9/28/20179/28/20179/28/2017MH
52017-10RS10/25/201710/25/201710/25/2017MH

 

IDdateRYMCompanyplaceMost Recent Placement Code
12016-012016-01AWMC8
12016-022016-02AWMC8
12016-032016-03AWMC1
12016-042016-04AWMC1
12016-052016-05AWMC2
12016-062016-06AWMC2
12016-072016-07AWMC2
12016-082016-08AWMC2
12016-092016-09AWMC2
12016-102016-10AWMC2
12016-112016-11AWMC2
12016-122016-12AWMC2
12017-012017-01AWMC1
12017-022017-02AWMC1
12017-032017-03AWMC1
12017-042017-04AWMC1
12017-052017-05AWMC1
12017-062017-06AWMC1
12017-072017-07AWMC1
12017-082017-08MSMC1
12017-092017-09MSMC1
12017-102017-10MSMC1
12017-112017-11MSMC1
12017-122017-12MSMC1
12018-012018-01MSMC1
12018-022018-02MSMC1
12018-032018-03MSMC1
12018-042018-04MSMC1
22016-012016-01MSMC1
22016-022016-02MSMC1
22016-032016-03MSMC8
22016-042016-04MSMC8
22016-052016-05MSMC1
22016-062016-06MSMC1
22016-072016-07MSMC1
22016-082016-08MSMC1
22016-092016-09MSMC1
22016-102016-10MSMC1
22016-112016-11MSMC1
22016-122016-12MSMC2
22017-012017-01MSMC1
22017-022017-02MSMC1
22017-032017-03MSMC1
22017-042017-04MSMC1
22017-052017-05MSMC2
22017-062017-06MSMC2
22017-072017-07MSMC2
22017-082017-08MSMC2
22017-092017-09MSMC2
22017-102017-10MSMC2
32016-012016-01AWMC1
32016-022016-02AWMC1
32016-032016-03AWMC1
32016-042016-04AWMC1
32016-052016-05AWMC1
32016-062016-06AWMC1
32016-072016-07AWMC1
32016-082016-08AWMC1
32016-092016-09AWMC1
32016-102016-10AWMC1
32016-112016-11AWMC1
32017-022017-02AWMC1
32017-032017-03AWMC1
32017-042017-04AWMC1
32017-052017-05AWMC1
32017-062017-06AWMC1
32017-072017-07AWMC1
32017-082017-08AWMC1
32017-092017-09AWMC1
32017-102017-10AWMC1
32017-112017-11AWMC1
32017-122017-12AWMC1
32018-012018-01AWMC1
32018-022018-02AWMC1
32018-032018-03AWMC13
32018-042018-04AWMC13
42016-012016-01RSMC1
42016-022016-02RSMC1
42016-032016-03RSMC1
42016-042016-04RSMC1
42016-052016-05RSMC1
42016-062016-06RSMC1
42016-072016-07RSMC1
42016-082016-08RSMC1
42016-092016-09RSMC1
42016-102016-10RSMC1
42016-112016-11RSMC1
42016-122016-12RSMC1
42017-012017-01RSMC1
42017-022017-02RSMC1
42017-032017-03RSMC1
42017-042017-04RSMC1
42017-052017-05RSMC1
42017-062017-06RSMC1
42017-072017-07RSMC1
42017-082017-08RSMC1
42017-092017-09RSMC1
42017-102017-10RSMC1
42017-112017-11RSMC1
42017-122017-12RSMC1
42018-012018-01RSMC1
42018-022018-02RSMC1
42018-032018-03RSMC1
42018-042018-04RSMC1
52016-012016-01RSMC1
52016-022016-02RSMC1
52016-032016-03RSMC1
52016-042016-04RSMC1
52016-052016-05RSMC1
52016-062016-06RSMC1
52016-072016-07RSMC1
52016-082016-08RSMC1
52016-092016-09RSMC1
52016-102016-10RSMC1
52016-112016-11RSMC1
52016-122016-12RSMC1
52017-012017-01RSMC1
52017-022017-02RSMC2
52017-032017-03RSMC2
52017-042017-04RSMC2
52017-052017-05RSMC2
52017-062017-06RSMC2
52017-072017-07RSMC2
52017-082017-08RSMC2
52017-092017-09RSMC2
52017-102017-10RSMC2
52017-112017-11RSMC1
52017-122017-12RSMC1
52018-012018-01RSMC1
52018-022018-02RSMC1
52018-032018-03RSMC1
52018-042018-04RSMC1

 

dataset 3   
IDStatusBegin DateEnd Datevariable 5
1c5/1/20176/30/2017m
1w7/1/20178/30/2017m
1t9/1/2017 m
2t3/1/20175/30/2017f
2c6/1/20179/30/2017f
3t2/1/201712/30/2017f
3w1/1/2018 f

 

 

Here are the 3 representative datasets as I have it right now in excel.

 

Thank you.

Contributor
Posts: 23

Re: subset a dataset based on variable obs of another dataset

Posted in reply to novinosrin

excel attachment:

PROC Star
Posts: 1,571

Re: subset a dataset based on variable obs of another dataset

@aled Thank you for extending that help in providing a better sample.

Please notice here carefully:

 

Condition: Based on 2 variables (ID and date) in dataset 1, I want to subset dataset 3 following a logic that if date in dataset 1 falls at or in between the begin date and end date, then keep that row in a new dataset (Act)

 

So when i read the raw file to sas dataset using 

 

data one;
infile cards truncover;
input ID date :anydtdte21. Company $ CDate :mmddyy10. RDate :mmddyy10. RRDate :mmddyy10. Type $;
format date mmddyy10.;
cards;
1 2017-06 AW 6/5/2017 5/16/2017 7/11/2017 IVA
1 2017-10 MS 10/10/2017 10/3/2017 10/17/2017 GA
2 2016-02 MS 2/25/2016 2/4/2016 3/7/2016 PA
2 2017-07 MS 7/20/2017 7/11/2017 7/25/2017 GA
3 2017-03 AW 3/22/2017 3/5/2017 5/3/2017 IVA
4 2017-09 RS 9/28/2017 9/28/2017 9/28/2017 MH
5 2017-10 RS 10/25/2017 10/25/2017 10/25/2017 MH
;

 

These year-month dates will revert to

date=06/01/2017
date=10/01/2017
date=02/01/2016
date=07/01/2017
date=03/01/2017
date=09/01/2017
date=10/01/2017

 

so when we look up dataset1 <->dataset3 using ID key and try to determine  a date value 2017-06 (read as 06/01/2017) between a complete date 

1c5/1/20176/30/2017m

will cause serious erroneous results. because we do not know what's the real date value of the year month dates like 2017-06  are?

 

This is what you need to clarify. 

 

Ask a Question
Discussion stats
  • 7 replies
  • 100 views
  • 1 like
  • 2 in conversation