BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lezgin
Obsidian | Level 7

I have the following dataset.

 

id1id2datevalue
1A20140.5
1A20140.7
1B20140.8
1C20140.1
1C20141.1
1D20141.2
1A20151.3
1A20152.4
1C20155.6
1D20150.4
1E20150.7
1F20150.3
1C20161.3
1D20162.5
1D20162
1F20164.1
1G20163.1
1H20162.1
2B20145.4


I want to compare the distinct id2s by each id1 and year group. For instance, I want to compare id2s where id1=1 and year=2014 with id2s where id1=1 and year=2015. So, in the first case(id1=1 and year=2014), the distinct values of id2 are A, B, C, D and for the second case (id1=1 and year=2015) distinct values are A, C, D, E. I can co this by using first.id1, first.year etc. After this step the dataset will look like this:

 

 

id1id2datevalue
1A20140.5
1B20140.8
1C20140.1
1D20141.2
1A20151.3
1C20155.6
1D20150.4
1E20150.7
1F20150.3
1C20161.3
1D20162.5
1F20164.1
1G20163.1
1H20162.1

 

Now, I want to compare these three groups and get the id2 that exist in first case but not exist in the second case (i.e. B exists in first id1xyear group but not in second id1xyear group). In the end, the measure I want will divide the total number of distinct id2s in first id1xyear with total number of nonexisting id2s in the second id1xyear (1/4 in the first comparison from 2014 to 2015).

 

It will repeat the same procedure with id1=2 and so on.

 

id1datemeasure
12014.
120151/4
120162/5
22014 

 

I hope I was able to convey what I need. I would appreciate any help. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

The following code is the fixed my original code. Sorry .

 

/*******For measure1*********/
data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
 set have;
 by id1 date;
 if id2 in x then m+1;
end;
want=divide(k-m,k);
if id1 ne lag(id1) then call missing(want);
output;

call missing(of x{*}); /* <--*/
k=0;
do until(last.date);
 set have;
 by id1 date;
 k+1;x{k}=id2;
end;
run;

 

 

 

/******For measure2***********/

data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
 set have;
 by id1 date;
 if id2 not in x then m+1; /*********/
end;
want=divide(m,k);  /*********/
if id1 ne lag(id1) then call missing(want);
output;

call missing(of x{*}); /*<---*/
k=0;
do until(last.date);
 set have;
 by id1 date;
 k+1;x{k}=id2;
end;

View solution in original post

10 REPLIES 10
Ksharp
Super User

I think it should be

id1 date measure
1 2014 .
1 2015 1/5

 

 

Use your second dataset.

 

data have;
infile cards expandtabs ;
input id1	id2	$ date	value;
cards;
1	A	2014	0.5
1	B	2014	0.8
1	C	2014	0.1
1	D	2014	1.2
1	A	2015	1.3
1	C	2015	5.6
1	D	2015	0.4
1	E	2015	0.7
1	F	2015	0.3
1	C	2016	1.3
1	D	2016	2.5
1	F	2016	4.1
1	G	2016	3.1
1	H	2016	2.1
;
run;
data want;
array x{9999} $ 32 _temporary_;
m=0;
do n=1 by 1 until(last.date);
 set have;
 by id1 date;
 if id2 in x then m+1;
end;
want=divide(k-m,n);
if id1 ne lag(id1) then call missing(want);
output;

k=0;
do until(last.date);
 set have;
 by id1 date;
 k+1;x{k}=id2;
end;
run;
lezgin
Obsidian | Level 7

Thank you very much. I appreciate your help but I think it is not giving what I am looking for. From 2014 to 2015 A,B,C,D becomes A,C,D,E,F. From 2015 to 2016 A,C,D,E,F becomes C,D,F,G,H.

 

So what I need is from 2014 to 2015: identify that B is missing in 2015 and divide this (1 missing) with all four (A,B,C,D) in 2014=1/4

From 2015 to 2016: identify that A and E is missing (2 missing). Divide 2 by what I have for 2015 (A,C,D,E,F) by 5= 2/5.

 

It should repeat this by id1. When next id1 comes the code should compare next first id1xdate group with the next date of same id1. I hope this is more clear now. Again, thank you very much.

Ksharp
Super User

OK. I get confused with denominator.

 

data have;
infile cards expandtabs ;
input id1	id2	$ date	value;
cards;
1	A	2014	0.5
1	B	2014	0.8
1	C	2014	0.1
1	D	2014	1.2
1	A	2015	1.3
1	C	2015	5.6
1	D	2015	0.4
1	E	2015	0.7
1	F	2015	0.3
1	C	2016	1.3
1	D	2016	2.5
1	F	2016	4.1
1	G	2016	3.1
1	H	2016	2.1
;
run;
data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
 set have;
 by id1 date;
 if id2 in x then m+1;
end;
want=divide(k-m,k);
if id1 ne lag(id1) then call missing(want);
output;

k=0;
do until(last.date);
 set have;
 by id1 date;
 k+1;x{k}=id2;
end;
run;
lezgin
Obsidian | Level 7

Thanks for your patience. I understand if you don't have time for this. You have already been very helpful. I appreciate that. I realized that I need to decompose the measure into two parts. Let me give you a new example of each measure. There are only subtle changes.

 

Measure1: Suppose we have A,B,C,D in id1xdate1 and A,B,C,E in id1xdate2 and C,D,E,F in id1xdate3. Measure1 finds that D from id1xdate1 is missing in id1xdate2 and divides D (1) by A,B,C,D (4) of id1xdate1 and returns 1/4. Similarly, from id1xdate2 to id1xdate3, this measure finds that A,B are missing in id1xdate2 and divides (2) by A,B,C,E (4) and returns 2/4. This measure makes the second comparison between id2s of id1xdate2 and id1xdate3. The change from id1xdate2 to id1xdate3 matters now.

 

Measure 2: Again, suppose we have A,B,C,D in id1xdate1 and A,B,C,E in id1xdate2 and C,D,E,F in id1xdate3. Measure2 finds that E is not present in id1xdate1 but is present in id1xdate2. This time measure2 divides E (1) by A,B,C,D (4) in id1xdate1 and return 1/4. From id1xdate2 to id1xdate3, the code finds that D,F are not present in id1xdate2 but are present in id1xdate3. Measure divides D,F (2) by A,B,C,E (4) and return 2/4. Similarly, this measure makes the second comparison between id2s of id1xdate2 and id1xdate3. The change from id1xdate2 to id1xdate3 matters now. 

 

Dataset for this example is below. I tried to modify your code to separate the two measures but I couldn't get the code to make the comparison between only two id1xdates. Not from the beginning.

 

id1id2datevalue
1A201405300.5
1B201405300.8
1C201405300.1
1D201405301.2
1A201408311.3
1B201408315.6
1C201408310.4
1E201408310.7
1C201411300.3
1D201411301.3
1E201411302.5
1F201411304.1
2A199901303
Ksharp
Super User

OK. the following code should work for your second measure.

the code I have already posted is for your first measure.

Note: use your second table in your first post .

 

 

data have;
infile cards expandtabs ;
input id1	id2	$ date	value;
cards;
1	A	20140530	0.5
1	B	20140530	0.8
1	C	20140530	0.1
1	D	20140530	1.2
1	A	20140831	1.3
1	B	20140831	5.6
1	C	20140831	0.4
1	E	20140831	0.7
1	C	20141130	0.3
1	D	20141130	1.3
1	E	20141130	2.5
1	F	20141130	4.1
2	A	19990130	3
;
run;
data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
 set have;
 by id1 date;
 if id2 not in x then m+1; /*********/
end;
want=divide(m,k);  /*********/
if id1 ne lag(id1) then call missing(want);
output;

k=0;
do until(last.date);
 set have;
 by id1 date;
 k+1;x{k}=id2;
end;
run;

 

lezgin
Obsidian | Level 7

In the first sheet of attached excel file I included a portion of the data. The first measure mostly works fine but sometimes returns negative values where it should return 0. The second measure does not work properly and I wasn't able to detect the problem. In the second sheet of the attached excel file I included the dateset transposed to show you how it should work. I included what the measure should return and what it returns when you use the dataset in the first sheet. The discrepancies are in red. Thank you very much.

Ksharp
Super User

I feel very Sorry. My bad. Add CALL MISSING(of x{*}); 

 

 

call missing(of x{*}) ;
k=0;
do until(last.date);
 set have;
 by id1 date;
 k+1;x{k}=id2;
end;
run;
Ksharp
Super User

Hi. I rewrite it by Hash Table. Hope you like it.

 

/*****For measure1 *******/
proc import datafile='/folders/myfolders/sascommunity.xlsx' out=have dbms=xlsx replace;
run;

proc sort data=have;
 by id1 date;
run;

data want;
if _n_=1 then do;
  if 0 then set have;
  declare hash h1();
  h1.definekey('id2');
  h1.definedone();
end;

n=h1.num_items;
do until(last.date);
 set have;
 by id1 date;
 rc=h1.remove();
end;
m=h1.num_items;
want=divide(m,n);
if id1 ne lag(id1) then call missing(want);
output;
h1.clear();
do until(last.date);
 set have;
 by id1 date;
 h1.ref();
end;
keep id1 id2 date want m n;
run;

 

 

 

 

 

/******For measure2 ***********/
proc import datafile='/folders/myfolders/sascommunity.xlsx' out=have dbms=xlsx replace;
run;

proc sort data=have;
 by id1 date;
run;

data want;
if _n_=1 then do;
  if 0 then set have;
  declare hash h1();
  h1.definekey('id2');
  h1.definedone();
end;

m=0;
n=h1.num_items;
do until(last.date);
 set have;
 by id1 date;
 if h1.check() ne 0 then m+1;
end;
want=divide(m,n);
if id1 ne lag(id1) then call missing(want);
output;
h1.clear();
do until(last.date);
 set have;
 by id1 date;
 h1.ref();
end;
keep id1 id2 date want m n;
run;
Ksharp
Super User

The following code is the fixed my original code. Sorry .

 

/*******For measure1*********/
data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
 set have;
 by id1 date;
 if id2 in x then m+1;
end;
want=divide(k-m,k);
if id1 ne lag(id1) then call missing(want);
output;

call missing(of x{*}); /* <--*/
k=0;
do until(last.date);
 set have;
 by id1 date;
 k+1;x{k}=id2;
end;
run;

 

 

 

/******For measure2***********/

data want;
array x{9999} $ 32 _temporary_;
m=0;
do until(last.date);
 set have;
 by id1 date;
 if id2 not in x then m+1; /*********/
end;
want=divide(m,k);  /*********/
if id1 ne lag(id1) then call missing(want);
output;

call missing(of x{*}); /*<---*/
k=0;
do until(last.date);
 set have;
 by id1 date;
 k+1;x{k}=id2;
end;
lezgin
Obsidian | Level 7

This has been very helpful. Thank you very much for your kind and quick replies.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 916 views
  • 2 likes
  • 2 in conversation