BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
brettmullga
Calcite | Level 5

I have recently migrated over from STATA. I have a set of panel data and want to observe the change in observations for a specific variable from one period to the next. I initially used the compare procedure; however, this does not work if the two variables are not sorted the same, etc. How should I go about comparing these variables?

1 ACCEPTED SOLUTION

Accepted Solutions
RafiRahi
Fluorite | Level 6

Hi brettmullga,

Please find below the solution using datastep and sql. I have created a dataset based on the information you gave and tried to get the result in the final dataset. If you have more months column then the coding would need to be updated. Please let me know if you have any questions.

All the best.

Thanks.

Rafi

DATA original;

INPUT Participant Month1 Month2 Month3 Month4;

CARDS;

1 8 1 . 9

2 . . 3 15

3 4 5 . 12

4 1 . . 10

;

RUN;

* Count total for each month;

PROC SQL;

  CREATE TABLE m_tot AS

  SELECT *, COUNT (*) as total, COUNT(month1) AS m1_tot, COUNT(month2) AS m2_tot, COUNT(month3) AS m3_tot, COUNT(month4) AS m4_tot

  FROM original

  ORDER BY participant;

QUIT;

* Using datasetp - 1) flag patient movement in each month and 2) flag patient who has value in previous month(s);

DATA flag;

  SET m_tot;

  IF month1 NE . AND month2 NE . THEN m1tom2f = 1;

  IF month2 NE . AND month3 NE . THEN m2tom3f = 1;

  IF month3 NE . AND month4 NE . THEN m3tom4f = 1;

  

  IF month2 NE . AND month1 NE . THEN m2tom1f = 1;

  IF month3 NE . AND N(month1,month2)  GE 1 THEN m3tom1f = 1;

  IF month4 NE . AND N(month1,month2,month3) GE 1 THEN m4tom1f = 1;

RUN;

* Using proc sql - count necessary values;

PROC SQL;

  CREATE TABLE finalx AS

  SELECT *, COUNT(m1tom2f) AS m1tom2, COUNT(m2tom3f) AS m2tom3, COUNT(m3tom4f) AS m3tom4, COUNT(m2tom1f) AS m2tom1, COUNT(m3tom1f) AS m3tom1, COUNT(m4tom1f) AS m4tom1

  FROM flag

  ORDER BY participant;

QUIT;

* Final dataset where all the necessary information for outputs will be found;

DATA final;

  SET finalx (DROP=participant month1 month2 month3 month4  m1tom2f m2tom3f m3tom4f m2tom1f m3tom1f m4tom1f) END=eof ;

  IF eof;

  LABEL total = 'Toatl participant'

        m1_tot = 'Total participant in Month 1'

        m2_tot = 'Total participant in Month 2'

        m3_tot = 'Total participant in Month 3'

        m4_tot = 'Total participant in Month 4'

        m1tom2 = 'Participated in Month 1 and Month 2'

        m2tom3 = 'Participated in Month 2 and Month 3'

        m3tom4 = 'Participated in Month 3 and Month 4'

        m2tom1 = 'Participated in Month 2 and in earlier month(s)'

        m3tom1 = 'Participated in Month 3 and in earlier month(s)'

        m4tom1 = 'Participated in Month 4 and in earlier month(s)'

        ;

RUN;

Rafi Rahi

View solution in original post

9 REPLIES 9
Reeza
Super User

What does your data look like? What type of output do you expect?

brettmullga
Calcite | Level 5

The variable I'm concerned is a list of participants. I have monthly data that goes back a decade or so. From one month to the next, I want to see how many drop out and how many new participants enter. I'm currently getting descriptive statistics as a part of a larger project.

Reeza
Super User

Ok...but how do your rows and columns look like. Do you have one record per participant and different variables for different time points or one record per participant per time?

It's really helpful to post some sample data and expected sample output.

brettmullga
Calcite | Level 5

Thank you, Reeza. I was confused about what I was asking for.

ParticipantMonth1Month2Month3Month4

1

81.9
2..315
345.12
41..10

I want to be able to count the number of participants in each month and the change in participants each month.

Here is my first attempt at this:

proc sql;

     select count(*) as Total,

     count(month1) as Month_1,

     count(month2) as Month_2,

     count(month3) as Month_3,

     count(month4) as Month_4

     from work.charity;

     quit;

Output:

Total     Month_1     Month_2     Month_3     Month_4 

    4            3                 2               1               4

This works to count individual columns but not the intersection of two months. The intersection will illustrate how many from one month continued on to the next month. Is it possible to specify this within the count statement?

Reeza
Super User

What do you want the output to look like.

brettmullga
Calcite | Level 5

Output 1:

Month1intersectMonth2     Month2intersectMonth3     Month3intersectMonth4

               2                                        0                                   1

    

Output 2:

Month2intersectMonth1     Month3intersectunion(2,1)     Month4intersectunion(3,2,1)

               2                                        0                                        4

Output 1 is the number of rows that have values from one month to the next.

Output 2 is the number of rows that have non-unique entries in the preceding columns.

Note: for this to work, an if then else statement may have to be applied to turn the data set into 1s and missing.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,


Try:

data have;
  attrib p format=$20. m1-m4 format=best.;
  infile datalines delimiter=',';
  input p $ m1-m4;
datalines;
1,8,1,., 9
2,.,.,3,15
3,4,5,.,12
4,1,.,.,10
;
quit;
proc sql;
  create table OUTPUT1
  (
    MONTH1INTERSECTMONTH2 num,
    MONTH2INTERSECTMONTH3 num,
    MONTH3INTERSECTMONTH4 num
  );
  create table OUTPUT2
  (
    MONTH2INTERSECTMONTH1 num,
    MONTH3INTERSECTMONTH21 num,
    MONTH4INTERSECTMONTH321 num
  );
  insert into OUTPUT1
  set MONTH1INTERSECTMONTH2=(select count(P) from WORK.HAVE where M1 is not null and M2 is not null),
      MONTH2INTERSECTMONTH3=(select count(P) from WORK.HAVE where M2 is not null and M3 is not null),
      MONTH3INTERSECTMONTH4=(select count(P) from WORK.HAVE where M3 is not null and M4 is not null);
  insert into OUTPUT2
  set MONTH2INTERSECTMONTH1=(select count(P) from WORK.HAVE where M2 is not null and M1 is not null),
      MONTH3INTERSECTMONTH21=(select count(P) from WORK.HAVE where M3 is not null and (M2 is not null or M1 is not null)),
      MONTH4INTERSECTMONTH321=(select count(P) from WORK.HAVE where M4 is not null and (M3 is not null or M2 is not null or M1 is not null));
quit;


overmar
Obsidian | Level 7

Or you could use this after the initial read in:

data have2;

    set have;

    array mon (4) m1-m4;

    array moni (3) mi1-mi3;

    do i = 1 to 3;

    if mon(i) ~=. and mon(i+1)~=. then moni(i) = 1;

    else moni(i) = 0;

    end;

run

proc sql;

create table want as

select distinct sum(mi1) as mi1, sum(mi2) as mi2, sum(mi3) as mi3, sum(case when m2~=. and m1 ~=. then 1 else 0 end) as m2mi1,

sum(case when m3~=. and (m1~=. or m2~=.) then 1 else 0 end) as m3mi12, sum(case when m4 ~=. and (m1~=. or m2 ~=. or m3 ~=.) then 1 else 0 end) as m4mi123

from have2;

;

RafiRahi
Fluorite | Level 6

Hi brettmullga,

Please find below the solution using datastep and sql. I have created a dataset based on the information you gave and tried to get the result in the final dataset. If you have more months column then the coding would need to be updated. Please let me know if you have any questions.

All the best.

Thanks.

Rafi

DATA original;

INPUT Participant Month1 Month2 Month3 Month4;

CARDS;

1 8 1 . 9

2 . . 3 15

3 4 5 . 12

4 1 . . 10

;

RUN;

* Count total for each month;

PROC SQL;

  CREATE TABLE m_tot AS

  SELECT *, COUNT (*) as total, COUNT(month1) AS m1_tot, COUNT(month2) AS m2_tot, COUNT(month3) AS m3_tot, COUNT(month4) AS m4_tot

  FROM original

  ORDER BY participant;

QUIT;

* Using datasetp - 1) flag patient movement in each month and 2) flag patient who has value in previous month(s);

DATA flag;

  SET m_tot;

  IF month1 NE . AND month2 NE . THEN m1tom2f = 1;

  IF month2 NE . AND month3 NE . THEN m2tom3f = 1;

  IF month3 NE . AND month4 NE . THEN m3tom4f = 1;

  

  IF month2 NE . AND month1 NE . THEN m2tom1f = 1;

  IF month3 NE . AND N(month1,month2)  GE 1 THEN m3tom1f = 1;

  IF month4 NE . AND N(month1,month2,month3) GE 1 THEN m4tom1f = 1;

RUN;

* Using proc sql - count necessary values;

PROC SQL;

  CREATE TABLE finalx AS

  SELECT *, COUNT(m1tom2f) AS m1tom2, COUNT(m2tom3f) AS m2tom3, COUNT(m3tom4f) AS m3tom4, COUNT(m2tom1f) AS m2tom1, COUNT(m3tom1f) AS m3tom1, COUNT(m4tom1f) AS m4tom1

  FROM flag

  ORDER BY participant;

QUIT;

* Final dataset where all the necessary information for outputs will be found;

DATA final;

  SET finalx (DROP=participant month1 month2 month3 month4  m1tom2f m2tom3f m3tom4f m2tom1f m3tom1f m4tom1f) END=eof ;

  IF eof;

  LABEL total = 'Toatl participant'

        m1_tot = 'Total participant in Month 1'

        m2_tot = 'Total participant in Month 2'

        m3_tot = 'Total participant in Month 3'

        m4_tot = 'Total participant in Month 4'

        m1tom2 = 'Participated in Month 1 and Month 2'

        m2tom3 = 'Participated in Month 2 and Month 3'

        m3tom4 = 'Participated in Month 3 and Month 4'

        m2tom1 = 'Participated in Month 2 and in earlier month(s)'

        m3tom1 = 'Participated in Month 3 and in earlier month(s)'

        m4tom1 = 'Participated in Month 4 and in earlier month(s)'

        ;

RUN;

Rafi Rahi

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1753 views
  • 8 likes
  • 5 in conversation