Help using Base SAS procedures

Count and Group By issue

Reply
Regular Contributor
Regular Contributor
Posts: 238

Count and Group By issue

I have a table that I am trying to find a way to do this smart. The table has over 8 million rows and when I groupby my taxID's there are a little over 80k. I want to be able to groupby my TAXID, the procedure and the place of service. I know I can do a case statement, but that becomes a long process because I would have a ton of case statements and think there has to be a better way. Maybe not. Here is an example of the table.

taxidsrvc_dtperfechopettteteecabgstemipl of srvc idpl of srvc desc
11/1/20101 1office
112/1/2011 1 4lab
111/4/2009 1 8inpatient
15/25/2011 1 12outpatient
13/2/2012 1 15hospice
1 1
1 1
1 1
11/15/212 1 20rehab
11/31/2012 125other
22/5/2012 11office
26/5/2012 1 4lab
26/7/2012 1 8inpatient
2 1
2 1
27/1/2012 1 15hospice
27/15/2012 1 20rehab
21/1/20131 25other

I tried doing a case statement but with having so many rows it seems to be kind of messy to do it like that. Just wondering if there is a better way. My code possible code would look something like this:

proc sql;
create table temp as
(select *,
case when pl_of_srvc_id in (1) and perf is not null and srvc_dt is not null then 1 end as perf_1,
case when pl_of_srvc_id in (4) and perf is not null and srvc_dt is not null then 1 end as perf_4,
case when pl_of_srvc_id in (8) and perf is not null and srvc_dt is not null then 1 end as perf_8,
case when pl_of_srvc_id in (12) and perf is not null and srvc_dt is not null then 1 end as perf_12,
case when pl_of_srvc_id in (15) and perf is not null and srvc_dt is not null then 1 end as perf_15,
case when pl_of_srvc_id in (20) and perf is not null and srvc_dt is not null then 1 end as perf_20,
case when pl_of_srvc_id in (25) and perf is not null and srvc_dt is not null then 1 end as perf_25

Note, I am only using perf but hope you get the picture. I have to code each of the procedures and each of the places of services because a perf will be in more than just the office which is 1. In the 8 million row table, all procedures show up in the above places of services so a case statement is required for each so all are counted. So, I end up with 7 case statements for every procedure with a total of 56 case statements. Then I would have to take and group by the TIN and then do another proc sql to sum it all up in another. Just hoping there is faster way of doing this.

Super Contributor
Posts: 333

Re: Count and Group By issue

Not knowing your ultimate needs for the dataset this may or may not work. But could you not sum over your procedures and group by TID and PID? You might then have to transpose it if you really want a variable combination of place of service and procedure. But I might be missing something in my read of your description.

Proc sql;

     create table out as

     select taxid, pl of srvc id, sum(perf) as perf, sum(echo) as echo .....

     from datain

     group by taxis, pl of srvc id

     ;

quit;

Someone else might have to help with the transpose code since Im not exactly sure what you need.

EJ

Regular Contributor
Regular Contributor
Posts: 238

Re: Count and Group By issue


I thought about that but did not try it. I will try that and see how it comes out.

Regular Contributor
Regular Contributor
Posts: 238

Re: Count and Group By issue

well it kind of worked but my proc transpose did not work like i thought it should.

Super Contributor
Posts: 339

Re: Count and Group By issue

If you use Esjackso1's approach, in order to "transpose" to a dataset like taxid perf_1 perf_4 perf_8 ... stemi_1 stemi_4...

you cannot (at least not as far as I know) do so in a single transpose.

A way around that is to transpose your 56 variables back in a single tall vertical dataset like

taxid      plofsrvcid            oldvar           value

1            1                           perf                250

1            4                           perf                110         

etc. and then simply concatenate oldvar with plofsrvcid as the name of your new var and transpose back. It takes 3 step or possibly 2 if you use multiple ids in the second transpose (plotsofsrvcid and oldvar but I am not sure if it trims the strings properly).

Vincent

Super User
Posts: 9,691

Re: Count and Group By issue

You can generate a macro variable to hold these variables.

data have;
infile cards  truncover expandtabs ;
input taxid     srvc_dt     :$20. perf     echo     pet     tte     tee     cabg     stemi     pl_of_srvc_id     pl_of_srvc_desc : $20.;
cards;
1     1/1/2010     1     .     .     .     .     .     .     1     office
1     12/1/2011     .     1     .     .     .     .     .     4     lab
1     11/4/2009     .     .     1     .     .     .     .     8     inpatient
1     5/25/2011     .     .     .     1     .     .     .     12     outpatient
1     3/2/2012     .     .     .     .     1     .     .     15     hospice
1          .          .     .     1     .     .     .     . . .
1          .          1     .     .     .     .     .     . . .
1          .     1     .     .     .     .     .     .     . . .
1     1/15/212     .     .     .     .     .     1     .     20     rehab
1     1/31/2012     .     .     .     .     .     .     1     25     other
2     2/5/2012     .     .     .     .     .     .     1     1     office
2     6/5/2012     .     .     .     .     .     1     .     4     lab
2     6/7/2012     .     .     .     1     .     .     .     8     inpatient
2          .     1     .     .     .     .     .     .     . . .
2          .     .     .     1     .     .     .     .     . . .
2     7/1/2012     .     .     .     .     1     .     .      15     hospice
2     7/15/2012     .     1     .     .     .     .     .     20     rehab
2     1/1/2013     1     .     .     .     .     .     .     25     other
;
run;
proc sql ;
create table x as
 select distinct pl_of_srvc_id from have where pl_of_srvc_id is not missing order by pl_of_srvc_id;

select  cats('perf_',pl_of_srvc_id) into : list separated by ' ' from x;
quit;
%put &list ;
data want(drop=i);
 set have;
 array p{*} &list ;
 do i=1 to dim(p);
  if pl_of_srvc_id=input(scan(vname(p{i}),-1,'_'),best8.) and not missing(perf) and not missing(srvc_dt) then do;p{i}=1;leave;end;
 end;
run;


Ksharp

Super Contributor
Posts: 297

Re: Count and Group By issue

Does the attached do what you are looking to achieve?

OPTIONS MPRINTNEST MLOGIC SYMBOLGEN;

DATA HAVE;

INFILE CARDS  TRUNCOVER EXPANDTABS ;

INPUT TAXID     SRVC_DT     :$20. PERF     ECHO     PET     TTE     TEE     CABG     STEMI     PL_OF_SRVC_ID     PL_OF_SRVC_DESC : $20.;

CARDS;

1     1/1/2010     1     .     .     .     .     .     .     1     OFFICE

1     12/1/2011     .     1     .     .     .     .     .     4     LAB

1     11/4/2009     .     .     1     .     .     .     .     8     INPATIENT

1     5/25/2011     .     .     .     1     .     .     .     12     OUTPATIENT

1     3/2/2012     .     .     .     .     1     .     .     15     HOSPICE

1          .          .     .     1     .     .     .     . . .

1          .          1     .     .     .     .     .     . . .

1          .     1     .     .     .     .     .     .     . . .

1     1/15/212     .     .     .     .     .     1     .     20     REHAB

1     1/31/2012     .     .     .     .     .     .     1     25     OTHER

2     2/5/2012     .     .     .     .     .     .     1     1     OFFICE

2     6/5/2012     .     .     .     .     .     1     .     4     LAB

2     6/7/2012     .     .     .     1     .     .     .     8     INPATIENT

2          .     1     .     .     .     .     .     .     . . .

2          .     .     .     1     .     .     .     .     . . .

2     7/1/2012     .     .     .     .     1     .     .      15     HOSPICE

2     7/15/2012     .     1     .     .     .     .     .     20     REHAB

2     1/1/2013     1     .     .     .     .     .     .     25     OTHER

;

RUN;

PROC FREQ DATA = HAVE NOPRINT;

  TABLES PL_OF_SRVC_ID /OUT = HAVE1 (WHERE = (PL_OF_SRVC_ID) DROP=COUNT PERCENT);

RUN;

%LET PERFLIST  =;

%LET ECHOLIST  =;

%LET PETLIST   =;

%LET TTELIST   =;

%LET TEELIST   =;

%LET CABGLIST  =;

%LET STEMILIST =;

%LET DSID  = %SYSFUNC(OPEN(HAVE1,I));

%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));

%GLOBAL PL_OF_SRVC_ID;

%SYSCALL SET(DSID);

%MACRO LISTVARS;

%DO I = 1 %TO &TOTAL.;

  %LET RC     = %SYSFUNC(FETCHOBS(&DSID.,&I.));

  %LET PERFLIST   = &PERFLIST. PERF_&PL_OF_SRVC_ID.;

  %LET ECHOLIST   = &ECHOLIST. ECHO_&PL_OF_SRVC_ID.;

  %LET PETLIST   = &PETLIST. PET_&PL_OF_SRVC_ID.;

  %LET TTELIST   = &TTELIST. TTE_&PL_OF_SRVC_ID.;

  %LET TEELIST   = &TEELIST. TEE_&PL_OF_SRVC_ID.;

  %LET CABGLIST   = &CABGLIST. CABG_&PL_OF_SRVC_ID.;

  %LET STEMILIST  = &STEMILIST. STEMI_&PL_OF_SRVC_ID.;

%END;

%LET RC = %SYSFUNC(CLOSE(&DSID.));

%MEND;

%LISTVARS;

DATA WANT(DROP=I);

  SET HAVE;

  ARRAY INN {*} PERF ECHO PET TTE TEE CABG STEMI;

  ARRAY OUTT{*} &PERFLIST. &ECHOLIST. &PETLIST. &TTELIST. &TEELIST. &CABGLIST. &STEMILIST.;

  DO I = 1 TO DIM(INN);

  IF NOT MISSING(INN{I}) AND NOT MISSING(SRVC_DT) THEN DO;

  DO O = 1 TO DIM(OUTT);

   

  IF PL_OF_SRVC_ID = INPUT(SCAN(VNAME(OUTT{O}),-1,'_'),BEST8.)

  AND VNAME(INN{I}) = SUBSTR(VNAME(OUTT{O}),1,INDEX(VNAME(OUTT{O}),"_")-1) THEN DO;

  OUTT{O}=INN{I};

  LEAVE;

   END;

  END;

  END;

  END;

RUN;

Super Contributor
Posts: 297

Re: Count and Group By issue

You could also attempt this method. It is worth noting that the output is slightly different as a result of the summary.  I am not sure which will be more efficient based on the amount of data you have, so give both a try and test it out.

DATA HAVE;

INFILE CARDS  TRUNCOVER EXPANDTABS ;

INPUT TAXID     SRVC_DT     :$20. PERF     ECHO     PET     TTE     TEE     CABG     STEMI     PL_OF_SRVC_ID     PL_OF_SRVC_DESC : $20.;

CARDS;

1     1/1/2010     1     .     .     .     .     .     .     1     OFFICE

1     12/1/2011     .     1     .     .     .     .     .     4     LAB

1     11/4/2009     .     .     1     .     .     .     .     8     INPATIENT

1     5/25/2011     .     .     .     1     .     .     .     12     OUTPATIENT

1     3/2/2012     .     .     .     .     1     .     .     15     HOSPICE

1          .          .     .     1     .     .     .     . . .

1          .          1     .     .     .     .     .     . . .

1          .     1     .     .     .     .     .     .     . . .

1     1/15/212     .     .     .     .     .     1     .     20     REHAB

1     1/31/2012     .     .     .     .     .     .     1     25     OTHER

2     2/5/2012     .     .     .     .     .     .     1     1     OFFICE

2     6/5/2012     .     .     .     .     .     1     .     4     LAB

2     6/7/2012     .     .     .     1     .     .     .     8     INPATIENT

2          .     1     .     .     .     .     .     .     . . .

2          .     .     .     1     .     .     .     .     . . .

2     7/1/2012     .     .     .     .     1     .     .      15     HOSPICE

2     7/15/2012     .     1     .     .     .     .     .     20     REHAB

2     1/1/2013     1     .     .     .     .     .     .     25     OTHER

;

RUN;

PROC SUMMARY DATA=HAVE NWAY MISSING;

  CLASS PL_OF_SRVC_ID PL_OF_SRVC_DESC TAXID SRVC_DT;

  VAR PERF ECHO PET TTE TEE CABG STEMI;

  OUTPUT OUT = HAVE1  (DROP = _Smiley Happy SUM=;

RUN;

PROC TRANSPOSE DATA=HAVE1 OUT=FLOP;

  BY PL_OF_SRVC_ID PL_OF_SRVC_DESC TAXID SRVC_DT;

RUN;

PROC TRANSPOSE DATA=FLOP OUT=WANT DELIMITER=_;

  ID  _NAME_ PL_OF_SRVC_ID;

  BY  PL_OF_SRVC_DESC TAXID SRVC_DT NOTSORTED;

RUN;

Ask a Question
Discussion stats
  • 7 replies
  • 296 views
  • 3 likes
  • 5 in conversation