Solved: How to Match Observations from one Variable to another Variable?

brettmullga · Posted 05-06-2014 03:57 PM

I have recently migrated over from STATA. I have a set of panel data and want to observe the change in observations for a specific variable from one period to the next. I initially used the compare procedure; however, this does not work if the two variables are not sorted the same, etc. How should I go about comparing these variables?

RafiRahi · Posted 05-15-2014 11:34 AM

Hi brettmullga,

Please find below the solution using datastep and sql. I have created a dataset based on the information you gave and tried to get the result in the final dataset. If you have more months column then the coding would need to be updated. Please let me know if you have any questions.

All the best.

Thanks.

Rafi

DATA original;

INPUT Participant Month1 Month2 Month3 Month4;

CARDS;

1 8 1 . 9

2 . . 3 15

3 4 5 . 12

4 1 . . 10

;

RUN;

* Count total for each month;

PROC SQL;

CREATE TABLE m_tot AS

SELECT *, COUNT (*) as total, COUNT(month1) AS m1_tot, COUNT(month2) AS m2_tot, COUNT(month3) AS m3_tot, COUNT(month4) AS m4_tot

FROM original

ORDER BY participant;

QUIT;

* Using datasetp - 1) flag patient movement in each month and 2) flag patient who has value in previous month(s);

DATA flag;

SET m_tot;

IF month1 NE . AND month2 NE . THEN m1tom2f = 1;

IF month2 NE . AND month3 NE . THEN m2tom3f = 1;

IF month3 NE . AND month4 NE . THEN m3tom4f = 1;

IF month2 NE . AND month1 NE . THEN m2tom1f = 1;

IF month3 NE . AND N(month1,month2) GE 1 THEN m3tom1f = 1;

IF month4 NE . AND N(month1,month2,month3) GE 1 THEN m4tom1f = 1;

RUN;

* Using proc sql - count necessary values;

PROC SQL;

CREATE TABLE finalx AS

SELECT *, COUNT(m1tom2f) AS m1tom2, COUNT(m2tom3f) AS m2tom3, COUNT(m3tom4f) AS m3tom4, COUNT(m2tom1f) AS m2tom1, COUNT(m3tom1f) AS m3tom1, COUNT(m4tom1f) AS m4tom1

FROM flag

ORDER BY participant;

QUIT;

* Final dataset where all the necessary information for outputs will be found;

DATA final;

SET finalx (DROP=participant month1 month2 month3 month4 m1tom2f m2tom3f m3tom4f m2tom1f m3tom1f m4tom1f) END=eof ;

IF eof;

LABEL total = 'Toatl participant'

m1_tot = 'Total participant in Month 1'

m2_tot = 'Total participant in Month 2'

m3_tot = 'Total participant in Month 3'

m4_tot = 'Total participant in Month 4'

m1tom2 = 'Participated in Month 1 and Month 2'

m2tom3 = 'Participated in Month 2 and Month 3'

m3tom4 = 'Participated in Month 3 and Month 4'

m2tom1 = 'Participated in Month 2 and in earlier month(s)'

m3tom1 = 'Participated in Month 3 and in earlier month(s)'

m4tom1 = 'Participated in Month 4 and in earlier month(s)'

;

RUN;

Rafi Rahi

View solution in original post

Reeza · Posted 05-06-2014 04:05 PM

What does your data look like? What type of output do you expect?

brettmullga · Posted 05-06-2014 04:10 PM

The variable I'm concerned is a list of participants. I have monthly data that goes back a decade or so. From one month to the next, I want to see how many drop out and how many new participants enter. I'm currently getting descriptive statistics as a part of a larger project.

Reeza · Posted 05-06-2014 04:15 PM

Ok...but how do your rows and columns look like. Do you have one record per participant and different variables for different time points or one record per participant per time?

It's really helpful to post some sample data and expected sample output.

brettmullga · Posted 05-14-2014 03:28 PM

Thank you, Reeza. I was confused about what I was asking for.

Participant	Month1	Month2	Month3	Month4
1	8	1	.	9
2	.	.	3	15
3	4	5	.	12
4	1	.	.	10

I want to be able to count the number of participants in each month and the change in participants each month.

Here is my first attempt at this:

proc sql;

select count(*) as Total,

count(month1) as Month_1,

count(month2) as Month_2,

count(month3) as Month_3,

count(month4) as Month_4

from work.charity;

quit;

Output:

Total Month_1 Month_2 Month_3 Month_4

4 3 2 1 4

This works to count individual columns but not the intersection of two months. The intersection will illustrate how many from one month continued on to the next month. Is it possible to specify this within the count statement?

Reeza · Posted 05-14-2014 03:49 PM

What do you want the output to look like.

brettmullga · Posted 05-15-2014 09:21 AM

Output 1:

Month1intersectMonth2 Month2intersectMonth3 Month3intersectMonth4

2 0 1

Output 2:

Month2intersectMonth1 Month3intersectunion(2,1) Month4intersectunion(3,2,1)

2 0 4

Output 1 is the number of rows that have values from one month to the next.

Output 2 is the number of rows that have non-unique entries in the preceding columns.

Note: for this to work, an if then else statement may have to be applied to turn the data set into 1s and missing.

RW9 · Posted 05-15-2014 09:59 AM

Hi,

Try:

data have;
attrib p format=$20. m1-m4 format=best.;
infile datalines delimiter=',';
input p $ m1-m4;
datalines;
1,8,1,., 9
2,.,.,3,15
3,4,5,.,12
4,1,.,.,10
;
quit;
proc sql;
create table OUTPUT1
(
    MONTH1INTERSECTMONTH2 num,
    MONTH2INTERSECTMONTH3 num,
    MONTH3INTERSECTMONTH4 num
);
create table OUTPUT2
(
    MONTH2INTERSECTMONTH1 num,
    MONTH3INTERSECTMONTH21 num,
    MONTH4INTERSECTMONTH321 num
);
insert into OUTPUT1
set MONTH1INTERSECTMONTH2=(select count(P) from WORK.HAVE where M1 is not null and M2 is not null),
      MONTH2INTERSECTMONTH3=(select count(P) from WORK.HAVE where M2 is not null and M3 is not null),
      MONTH3INTERSECTMONTH4=(select count(P) from WORK.HAVE where M3 is not null and M4 is not null);
insert into OUTPUT2
set MONTH2INTERSECTMONTH1=(select count(P) from WORK.HAVE where M2 is not null and M1 is not null),
      MONTH3INTERSECTMONTH21=(select count(P) from WORK.HAVE where M3 is not null and (M2 is not null or M1 is not null)),
      MONTH4INTERSECTMONTH321=(select count(P) from WORK.HAVE where M4 is not null and (M3 is not null or M2 is not null or M1 is not null));
quit;

overmar · Posted 05-15-2014 11:53 AM

Or you could use this after the initial read in:

data have2;

set have;

array mon (4) m1-m4;

array moni (3) mi1-mi3;

do i = 1 to 3;

if mon(i) ~=. and mon(i+1)~=. then moni(i) = 1;

else moni(i) = 0;

end;

run

proc sql;

create table want as

select distinct sum(mi1) as mi1, sum(mi2) as mi2, sum(mi3) as mi3, sum(case when m2~=. and m1 ~=. then 1 else 0 end) as m2mi1,

sum(case when m3~=. and (m1~=. or m2~=.) then 1 else 0 end) as m3mi12, sum(case when m4 ~=. and (m1~=. or m2 ~=. or m3 ~=.) then 1 else 0 end) as m4mi123

from have2;

;

RafiRahi · Posted 05-15-2014 11:34 AM

Hi brettmullga,

Please find below the solution using datastep and sql. I have created a dataset based on the information you gave and tried to get the result in the final dataset. If you have more months column then the coding would need to be updated. Please let me know if you have any questions.

All the best.

Thanks.

Rafi

DATA original;

INPUT Participant Month1 Month2 Month3 Month4;

CARDS;

1 8 1 . 9

2 . . 3 15

3 4 5 . 12

4 1 . . 10

;

RUN;

* Count total for each month;

PROC SQL;

CREATE TABLE m_tot AS

SELECT *, COUNT (*) as total, COUNT(month1) AS m1_tot, COUNT(month2) AS m2_tot, COUNT(month3) AS m3_tot, COUNT(month4) AS m4_tot

FROM original

ORDER BY participant;

QUIT;

* Using datasetp - 1) flag patient movement in each month and 2) flag patient who has value in previous month(s);

DATA flag;

SET m_tot;

IF month1 NE . AND month2 NE . THEN m1tom2f = 1;

IF month2 NE . AND month3 NE . THEN m2tom3f = 1;

IF month3 NE . AND month4 NE . THEN m3tom4f = 1;

IF month2 NE . AND month1 NE . THEN m2tom1f = 1;

IF month3 NE . AND N(month1,month2) GE 1 THEN m3tom1f = 1;

IF month4 NE . AND N(month1,month2,month3) GE 1 THEN m4tom1f = 1;

RUN;

* Using proc sql - count necessary values;

PROC SQL;

CREATE TABLE finalx AS

SELECT *, COUNT(m1tom2f) AS m1tom2, COUNT(m2tom3f) AS m2tom3, COUNT(m3tom4f) AS m3tom4, COUNT(m2tom1f) AS m2tom1, COUNT(m3tom1f) AS m3tom1, COUNT(m4tom1f) AS m4tom1

FROM flag

ORDER BY participant;

QUIT;

* Final dataset where all the necessary information for outputs will be found;

DATA final;

SET finalx (DROP=participant month1 month2 month3 month4 m1tom2f m2tom3f m3tom4f m2tom1f m3tom1f m4tom1f) END=eof ;

IF eof;

LABEL total = 'Toatl participant'

m1_tot = 'Total participant in Month 1'

m2_tot = 'Total participant in Month 2'

m3_tot = 'Total participant in Month 3'

m4_tot = 'Total participant in Month 4'

m1tom2 = 'Participated in Month 1 and Month 2'

m2tom3 = 'Participated in Month 2 and Month 3'

m3tom4 = 'Participated in Month 3 and Month 4'

m2tom1 = 'Participated in Month 2 and in earlier month(s)'

m3tom1 = 'Participated in Month 3 and in earlier month(s)'

m4tom1 = 'Participated in Month 4 and in earlier month(s)'

;

RUN;

Rafi Rahi

How to Match Observations from one Variable to another Variable?

Re: How to Match Observations from one Variable to another Variable?

Re: How to Match Observations from one Variable to another Variable?

Re: How to Match Observations from one Variable to another Variable?

Re: How to Match Observations from one Variable to another Variable?

Re: How to Match Observations from one Variable to another Variable?

Re: How to Match Observations from one Variable to another Variable?

Re: How to Match Observations from one Variable to another Variable?

Re: How to Match Observations from one Variable to another Variable?

Re: How to Match Observations from one Variable to another Variable?

Re: How to Match Observations from one Variable to another Variable?

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away