BookmarkSubscribeRSS Feed
tmm
Fluorite | Level 6 tmm
Fluorite | Level 6

I have a table that I am trying to find a way to do this smart. The table has over 8 million rows and when I groupby my taxID's there are a little over 80k. I want to be able to groupby my TAXID, the procedure and the place of service. I know I can do a case statement, but that becomes a long process because I would have a ton of case statements and think there has to be a better way. Maybe not. Here is an example of the table.

taxidsrvc_dtperfechopettteteecabgstemipl of srvc idpl of srvc desc
11/1/20101 1office
112/1/2011 1 4lab
111/4/2009 1 8inpatient
15/25/2011 1 12outpatient
13/2/2012 1 15hospice
1 1
1 1
1 1
11/15/212 1 20rehab
11/31/2012 125other
22/5/2012 11office
26/5/2012 1 4lab
26/7/2012 1 8inpatient
2 1
2 1
27/1/2012 1 15hospice
27/15/2012 1 20rehab
21/1/20131 25other

I tried doing a case statement but with having so many rows it seems to be kind of messy to do it like that. Just wondering if there is a better way. My code possible code would look something like this:

proc sql;
create table temp as
(select *,
case when pl_of_srvc_id in (1) and perf is not null and srvc_dt is not null then 1 end as perf_1,
case when pl_of_srvc_id in (4) and perf is not null and srvc_dt is not null then 1 end as perf_4,
case when pl_of_srvc_id in (8) and perf is not null and srvc_dt is not null then 1 end as perf_8,
case when pl_of_srvc_id in (12) and perf is not null and srvc_dt is not null then 1 end as perf_12,
case when pl_of_srvc_id in (15) and perf is not null and srvc_dt is not null then 1 end as perf_15,
case when pl_of_srvc_id in (20) and perf is not null and srvc_dt is not null then 1 end as perf_20,
case when pl_of_srvc_id in (25) and perf is not null and srvc_dt is not null then 1 end as perf_25

Note, I am only using perf but hope you get the picture. I have to code each of the procedures and each of the places of services because a perf will be in more than just the office which is 1. In the 8 million row table, all procedures show up in the above places of services so a case statement is required for each so all are counted. So, I end up with 7 case statements for every procedure with a total of 56 case statements. Then I would have to take and group by the TIN and then do another proc sql to sum it all up in another. Just hoping there is faster way of doing this.

7 REPLIES 7
esjackso
Quartz | Level 8

Not knowing your ultimate needs for the dataset this may or may not work. But could you not sum over your procedures and group by TID and PID? You might then have to transpose it if you really want a variable combination of place of service and procedure. But I might be missing something in my read of your description.

Proc sql;

     create table out as

     select taxid, pl of srvc id, sum(perf) as perf, sum(echo) as echo .....

     from datain

     group by taxis, pl of srvc id

     ;

quit;

Someone else might have to help with the transpose code since Im not exactly sure what you need.

EJ

tmm
Fluorite | Level 6 tmm
Fluorite | Level 6


I thought about that but did not try it. I will try that and see how it comes out.

tmm
Fluorite | Level 6 tmm
Fluorite | Level 6

well it kind of worked but my proc transpose did not work like i thought it should.

Vince28_Statcan
Quartz | Level 8

If you use Esjackso1's approach, in order to "transpose" to a dataset like taxid perf_1 perf_4 perf_8 ... stemi_1 stemi_4...

you cannot (at least not as far as I know) do so in a single transpose.

A way around that is to transpose your 56 variables back in a single tall vertical dataset like

taxid      plofsrvcid            oldvar           value

1            1                           perf                250

1            4                           perf                110         

etc. and then simply concatenate oldvar with plofsrvcid as the name of your new var and transpose back. It takes 3 step or possibly 2 if you use multiple ids in the second transpose (plotsofsrvcid and oldvar but I am not sure if it trims the strings properly).

Vincent

Ksharp
Super User

You can generate a macro variable to hold these variables.

data have;
infile cards  truncover expandtabs ;
input taxid     srvc_dt     :$20. perf     echo     pet     tte     tee     cabg     stemi     pl_of_srvc_id     pl_of_srvc_desc : $20.;
cards;
1     1/1/2010     1     .     .     .     .     .     .     1     office
1     12/1/2011     .     1     .     .     .     .     .     4     lab
1     11/4/2009     .     .     1     .     .     .     .     8     inpatient
1     5/25/2011     .     .     .     1     .     .     .     12     outpatient
1     3/2/2012     .     .     .     .     1     .     .     15     hospice
1          .          .     .     1     .     .     .     . . .
1          .          1     .     .     .     .     .     . . .
1          .     1     .     .     .     .     .     .     . . .
1     1/15/212     .     .     .     .     .     1     .     20     rehab
1     1/31/2012     .     .     .     .     .     .     1     25     other
2     2/5/2012     .     .     .     .     .     .     1     1     office
2     6/5/2012     .     .     .     .     .     1     .     4     lab
2     6/7/2012     .     .     .     1     .     .     .     8     inpatient
2          .     1     .     .     .     .     .     .     . . .
2          .     .     .     1     .     .     .     .     . . .
2     7/1/2012     .     .     .     .     1     .     .      15     hospice
2     7/15/2012     .     1     .     .     .     .     .     20     rehab
2     1/1/2013     1     .     .     .     .     .     .     25     other
;
run;
proc sql ;
create table x as
 select distinct pl_of_srvc_id from have where pl_of_srvc_id is not missing order by pl_of_srvc_id;

select  cats('perf_',pl_of_srvc_id) into : list separated by ' ' from x;
quit;
%put &list ;
data want(drop=i);
 set have;
 array p{*} &list ;
 do i=1 to dim(p);
  if pl_of_srvc_id=input(scan(vname(p{i}),-1,'_'),best8.) and not missing(perf) and not missing(srvc_dt) then do;p{i}=1;leave;end;
 end;
run;


Ksharp

Scott_Mitchell
Quartz | Level 8

Does the attached do what you are looking to achieve?

OPTIONS MPRINTNEST MLOGIC SYMBOLGEN;

DATA HAVE;

INFILE CARDS  TRUNCOVER EXPANDTABS ;

INPUT TAXID     SRVC_DT     :$20. PERF     ECHO     PET     TTE     TEE     CABG     STEMI     PL_OF_SRVC_ID     PL_OF_SRVC_DESC : $20.;

CARDS;

1     1/1/2010     1     .     .     .     .     .     .     1     OFFICE

1     12/1/2011     .     1     .     .     .     .     .     4     LAB

1     11/4/2009     .     .     1     .     .     .     .     8     INPATIENT

1     5/25/2011     .     .     .     1     .     .     .     12     OUTPATIENT

1     3/2/2012     .     .     .     .     1     .     .     15     HOSPICE

1          .          .     .     1     .     .     .     . . .

1          .          1     .     .     .     .     .     . . .

1          .     1     .     .     .     .     .     .     . . .

1     1/15/212     .     .     .     .     .     1     .     20     REHAB

1     1/31/2012     .     .     .     .     .     .     1     25     OTHER

2     2/5/2012     .     .     .     .     .     .     1     1     OFFICE

2     6/5/2012     .     .     .     .     .     1     .     4     LAB

2     6/7/2012     .     .     .     1     .     .     .     8     INPATIENT

2          .     1     .     .     .     .     .     .     . . .

2          .     .     .     1     .     .     .     .     . . .

2     7/1/2012     .     .     .     .     1     .     .      15     HOSPICE

2     7/15/2012     .     1     .     .     .     .     .     20     REHAB

2     1/1/2013     1     .     .     .     .     .     .     25     OTHER

;

RUN;

PROC FREQ DATA = HAVE NOPRINT;

  TABLES PL_OF_SRVC_ID /OUT = HAVE1 (WHERE = (PL_OF_SRVC_ID) DROP=COUNT PERCENT);

RUN;

%LET PERFLIST  =;

%LET ECHOLIST  =;

%LET PETLIST   =;

%LET TTELIST   =;

%LET TEELIST   =;

%LET CABGLIST  =;

%LET STEMILIST =;

%LET DSID  = %SYSFUNC(OPEN(HAVE1,I));

%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));

%GLOBAL PL_OF_SRVC_ID;

%SYSCALL SET(DSID);

%MACRO LISTVARS;

%DO I = 1 %TO &TOTAL.;

  %LET RC     = %SYSFUNC(FETCHOBS(&DSID.,&I.));

  %LET PERFLIST   = &PERFLIST. PERF_&PL_OF_SRVC_ID.;

  %LET ECHOLIST   = &ECHOLIST. ECHO_&PL_OF_SRVC_ID.;

  %LET PETLIST   = &PETLIST. PET_&PL_OF_SRVC_ID.;

  %LET TTELIST   = &TTELIST. TTE_&PL_OF_SRVC_ID.;

  %LET TEELIST   = &TEELIST. TEE_&PL_OF_SRVC_ID.;

  %LET CABGLIST   = &CABGLIST. CABG_&PL_OF_SRVC_ID.;

  %LET STEMILIST  = &STEMILIST. STEMI_&PL_OF_SRVC_ID.;

%END;

%LET RC = %SYSFUNC(CLOSE(&DSID.));

%MEND;

%LISTVARS;

DATA WANT(DROP=I);

  SET HAVE;

  ARRAY INN {*} PERF ECHO PET TTE TEE CABG STEMI;

  ARRAY OUTT{*} &PERFLIST. &ECHOLIST. &PETLIST. &TTELIST. &TEELIST. &CABGLIST. &STEMILIST.;

  DO I = 1 TO DIM(INN);

  IF NOT MISSING(INN{I}) AND NOT MISSING(SRVC_DT) THEN DO;

  DO O = 1 TO DIM(OUTT);

   

  IF PL_OF_SRVC_ID = INPUT(SCAN(VNAME(OUTT{O}),-1,'_'),BEST8.)

  AND VNAME(INN{I}) = SUBSTR(VNAME(OUTT{O}),1,INDEX(VNAME(OUTT{O}),"_")-1) THEN DO;

  OUTT{O}=INN{I};

  LEAVE;

   END;

  END;

  END;

  END;

RUN;

Scott_Mitchell
Quartz | Level 8

You could also attempt this method. It is worth noting that the output is slightly different as a result of the summary.  I am not sure which will be more efficient based on the amount of data you have, so give both a try and test it out.

DATA HAVE;

INFILE CARDS  TRUNCOVER EXPANDTABS ;

INPUT TAXID     SRVC_DT     :$20. PERF     ECHO     PET     TTE     TEE     CABG     STEMI     PL_OF_SRVC_ID     PL_OF_SRVC_DESC : $20.;

CARDS;

1     1/1/2010     1     .     .     .     .     .     .     1     OFFICE

1     12/1/2011     .     1     .     .     .     .     .     4     LAB

1     11/4/2009     .     .     1     .     .     .     .     8     INPATIENT

1     5/25/2011     .     .     .     1     .     .     .     12     OUTPATIENT

1     3/2/2012     .     .     .     .     1     .     .     15     HOSPICE

1          .          .     .     1     .     .     .     . . .

1          .          1     .     .     .     .     .     . . .

1          .     1     .     .     .     .     .     .     . . .

1     1/15/212     .     .     .     .     .     1     .     20     REHAB

1     1/31/2012     .     .     .     .     .     .     1     25     OTHER

2     2/5/2012     .     .     .     .     .     .     1     1     OFFICE

2     6/5/2012     .     .     .     .     .     1     .     4     LAB

2     6/7/2012     .     .     .     1     .     .     .     8     INPATIENT

2          .     1     .     .     .     .     .     .     . . .

2          .     .     .     1     .     .     .     .     . . .

2     7/1/2012     .     .     .     .     1     .     .      15     HOSPICE

2     7/15/2012     .     1     .     .     .     .     .     20     REHAB

2     1/1/2013     1     .     .     .     .     .     .     25     OTHER

;

RUN;

PROC SUMMARY DATA=HAVE NWAY MISSING;

  CLASS PL_OF_SRVC_ID PL_OF_SRVC_DESC TAXID SRVC_DT;

  VAR PERF ECHO PET TTE TEE CABG STEMI;

  OUTPUT OUT = HAVE1  (DROP = _:) SUM=;

RUN;

PROC TRANSPOSE DATA=HAVE1 OUT=FLOP;

  BY PL_OF_SRVC_ID PL_OF_SRVC_DESC TAXID SRVC_DT;

RUN;

PROC TRANSPOSE DATA=FLOP OUT=WANT DELIMITER=_;

  ID  _NAME_ PL_OF_SRVC_ID;

  BY  PL_OF_SRVC_DESC TAXID SRVC_DT NOTSORTED;

RUN;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1033 views
  • 3 likes
  • 5 in conversation