BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
boricuanalyst
Calcite | Level 5

SAS community (1st time messaging)

I am stuck trying to determine the last part of my code.  Following are made up medical data were patients sought specialty care in the surrounding community (city, state).  Desired output should array data by CLINIC, PRODUCT_LINE, and VISITS.  I want to list only (by-CLINIC and by PRODUCT_LINE) the top three rows including VISITS (descending order); thereafter, row four should sum the remaining PRODUCT_LINE VISITS.

 

DATA CARE                    ;

 

      INPUT PRODUCT_LINE$12. CITY$12. ST$3. CLINIC$5. VISITS 4.0;

      LENGTH CITYST $20. ;

      CITYST=CITY||'_'||ST  ;

 

      CARDS  ;

CARDIOLOGY   TEMPLE          TX 0110 200

CARDIOLOGY   TEMPLE          TX 0110 300

CARDIOLOGY   LUBBOCK        TX 0110  50

CARDIOLOGY   BUDA               TX 0110 300

CARDIOLOGY   BANDERA        TX 0110 200

CARDIOLOGY   SAN ANTONIO TX 0110  20

CARDIOLOGY   TEMPLE           TX 0110 100

CARDIOLOGY   SAN ANTONIO TX 0110  50

CARDIOLOGY   SAN ANTONIO TX 0110  10

CARDIOLOGY   TEMPLE           TX 0110 100

CARDIOLOGY   TEMPLE           TX 0110 100

;

RUN  ;

 

PROC SUMMARY data=CARE nway  ;

   var VISITS ;

   class CLINIC PRODUCT_LINE CITYST  / missing ;

   output out=CARE_TEST3 (drop=_TYPE_ _FREQ_) sum=VISITS   ;

run  ;

 

PROC SORT DATA=CARE_TEST3  ;

      BY CLINIC  ;

RUN  ;

 

PROC RANK DATA=CARE_TEST3 OUT=V_TEST DESCENDING  ;

      BY CLINIC PRODUCT_LINE ;

      VAR VISITS  ;

      RANKS R_VISITS  ;

 RUN  ;

 

PROC SORT DATA=V_TEST  ;

      BY CLINIC PRODUCT_LINE R_VISITS CITYST   ;

      WHERE R_VISITS <=3  ;

RUN  ;

 

SORT V_TEST RAW OUTPUT

Obs CLINIC  PRODUCT_LINE CITYST                VISITS   R_VISITS

  1    0110    CARDIOLOGY      TEMPLE      _TX   800        1  

  2    0110    CARDIOLOGY      BUDA          _TX    300        2  

  3    0110    CARDIOLOGY      BANDERA   _TX    200       3  

 

 

DESIRED CARDIOLOGY OUTPUT:

Obs CLINIC PRODUCT_LINE  CITYST                     VISITS    R_VISITS

  1    0110    CARDIOLOGY      TEMPLE        _TX      800         1  

  2    0110    CARDIOLOGY      BUDA             _TX      300         2  

  3    0110    CARDIOLOGY      BANDERA      _TX      200         3  

  3    0110    CARDIOLOGY      CARE_ELSEWHERE 130         4  

 

Thanks in advance for any help!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Edit note:  I've combined the original answer plus the follow-up, so that the response marked as "solution" makes sense:

Posted in reply to boricuanalyst

I don't think you need proc rank.  And I also suggest you let proc summary not only provide the 3-way (clinic*product_line*cityst) total VISITS (with _type_=7), but also the other marginal totals - in particular _TYPE_=6  (marginal totals for clinic*product_line).  Then you can

  1.  Read in the _type_=6 to get total_visits, 
  2.  Then read (in descending VISIT order) the individual VISIT counts, continuously decrementing total_visits
  3.  When you get to r_visit=4 copy the remaining total_visits to visits:
DATA CARE                    ;
  INPUT PRODUCT_LINE$12. CITY$12. ST$3. CLINIC$5. VISITS 4.0;
  LENGTH CITYST $20. ;
  CITYST=CITY||'_'||ST  ;
CARDS  ;
CARDIOLOGY   TEMPLE      TX 0110 200let 
CARDIOLOGY   TEMPLE      TX 0110 300
CARDIOLOGY   LUBBOCK     TX 0110  50
CARDIOLOGY   BUDA        TX 0110 300
CARDIOLOGY   BANDERA     TX 0110 200
CARDIOLOGY   SAN ANTONIO TX 0110  20
CARDIOLOGY   TEMPLE      TX 0110 100
CARDIOLOGY   SAN ANTONIO TX 0110  50
CARDIOLOGY   SAN ANTONIO TX 0110  10
CARDIOLOGY   TEMPLE      TX 0110 100
CARDIOLOGY   TEMPLE      TX 0110 100
RUN  ;
 
PROC SUMMARY data=CARE  ;
  var VISITS ;
  class CLINIC PRODUCT_LINE CITYST  / missing ;
  output out=CARE_TEST3 (drop=_FREQ_) sum=VISITS   ;
run  ;

proc sort data=care_test3 (where=(_type_=6 or _type_=7));
  by clinic product_line _type_ descending visits;
run;

data want (drop=_:);
  set care_test3 ;
  retain _total_visits r_visits;
  if _type_=6 then do;
    _total_visits=visits;
    r_visits=0;
  end;
  if _type_=7;
  r_visits+1;
  if r_visits<=3 then do;
    output;
    _total_visits=_total_visits-visits;
  end;
  else if r_visits=4 then do;
    visits=_total_visits;
    cityst='CARE_ELSEWHERE';
    output;
  end;
run;

Note the PROC SORT reads in only the _TYPE_=6 and _TYPE_=7 records produced by PROC SUMMARY.  And the sort sorts the data by CLINIC/PRODUCT_LINE.  Within each CLINIC/PRODUCT_LINE there will be only one _TYPE_=6 record, followed by the _TYPE_=7 records, which are sorted by descending visit.

 

The sort output if read, and every time there is a _TYPE_=6 record, _TOTAL_VISITS is reset and r_visit is set to 0.  For each _type_=7 record, r_visit is incremented, the data is output (for r_visit<=3), and _TOTAL_VISITS is decremented by VISITS.  So when r_visit=4 is encountered it just outputs the updated _TOTAL_VISTS (copied to VISITS).

 

 

 

Response to "Since VISITS can/will reach in the thousands for multiple CITYST throughout the US, if CITYST is >20+ will the aforementioned code still work?"

 

The code will work no matter how many levels of CITYST (or CLINIC or PRODUCTLINE) is in your data.  The "_TYPE_" variables only designate the "type" of aggregation, not the number of levels a class variable might have.

 

Let's say  you have 3 CLASS variables in proc summary, A, B, and C  (with Na, Nb, and Nc levels - including missing in your case).  Then you can have 2**3=8 "types" of totals/means/other stats.  For instance if you have 

CLASS A B C, then the _type_ levels would be

 

 

  •  _type_=0    Global Totals    (1 observation)
  •  _type_=1    C one-way subtotals  (Nc observations)
  •  _type_=2    B one-way subtotals (Nb obs)
  •  _type_=3   B*C two-way subtotals (Nb*Nc)
  •  _type_=4   A one-way subtotals (Na)
  •  _type_=5  A*C two-way subtotals (Na*Nc)
  • _type_=6    A*B     (Na*Nb obs)
  • _type_=7    A*B*C  (Na*Nb*Nc obs)
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
mkeintz
PROC Star

I don't think you need proc rank.  And I also suggest you let proc summary not only provide the 3-way (clinic*product_line*cityst) total VISITS (with _type_=7), but also the other marginal totals - in particular _TYPE_=6  (marginal totals for clinic*product_line).  Then you can

  1.  Read in the _type_=6 to get total_visits, 
  2.  Then read (in descending VISIT order) the individual VISIT counts, continuously decrementing total_visits
  3.  When you get to r_visit=4 copy the remaining total_visits to visits:
DATA CARE                    ;
  INPUT PRODUCT_LINE$12. CITY$12. ST$3. CLINIC$5. VISITS 4.0;
  LENGTH CITYST $20. ;
  CITYST=CITY||'_'||ST  ;
CARDS  ;
CARDIOLOGY   TEMPLE      TX 0110 200let 
CARDIOLOGY   TEMPLE      TX 0110 300
CARDIOLOGY   LUBBOCK     TX 0110  50
CARDIOLOGY   BUDA        TX 0110 300
CARDIOLOGY   BANDERA     TX 0110 200
CARDIOLOGY   SAN ANTONIO TX 0110  20
CARDIOLOGY   TEMPLE      TX 0110 100
CARDIOLOGY   SAN ANTONIO TX 0110  50
CARDIOLOGY   SAN ANTONIO TX 0110  10
CARDIOLOGY   TEMPLE      TX 0110 100
CARDIOLOGY   TEMPLE      TX 0110 100
RUN  ;
 
PROC SUMMARY data=CARE  ;
  var VISITS ;
  class CLINIC PRODUCT_LINE CITYST  / missing ;
  output out=CARE_TEST3 (drop=_FREQ_) sum=VISITS   ;
run  ;

proc sort data=care_test3 (where=(_type_=6 or _type_=7));
  by clinic product_line _type_ descending visits;
run;

data want (drop=_:);
  set care_test3 ;
  retain _total_visits r_visits;
  if _type_=6 then do;
    _total_visits=visits;
    r_visits=0;
  end;
  if _type_=7;
  r_visits+1;
  if r_visits<=3 then do;
    output;
    _total_visits=_total_visits-visits;
  end;
  else if r_visits=4 then do;
    visits=_total_visits;
    cityst='CARE_ELSEWHERE';
    output;
  end;
run;

Note the PROC SORT reads in only the _TYPE_=6 and _TYPE_=7 records produced by PROC SUMMARY.  And the sort sorts the data by CLINIC/PRODUCT_LINE.  Within each CLINIC/PRODUCT_LINE there will be only one _TYPE_=6 record, followed by the _TYPE_=7 records, which are sorted by descending visit.

 

The sort output if read, and every time there is a _TYPE_=6 record, _TOTAL_VISITS is reset and r_visit is set to 0.  For each _type_=7 record, r_visit is incremented, the data is output (for r_visit<=3), and _TOTAL_VISITS is decremented by VISITS.  So when r_visit=4 is encountered it just outputs the updated _TOTAL_VISTS (copied to VISITS).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
boricuanalyst
Calcite | Level 5

mkeintz, thank you for the prompt response.

The code worked exactly as desired ... beautifully.  Question!  I am trying to understand the '(where=(_type_6 or _type_7))' code.  Since VISITS can/will reach in the thousands for multiple CITYST throughout the US, if CITYST is >20+ will the aforementioned code still work?
Thanks again.
mkeintz
PROC Star

Edit note:  I've combined the original answer plus the follow-up, so that the response marked as "solution" makes sense:

Posted in reply to boricuanalyst

I don't think you need proc rank.  And I also suggest you let proc summary not only provide the 3-way (clinic*product_line*cityst) total VISITS (with _type_=7), but also the other marginal totals - in particular _TYPE_=6  (marginal totals for clinic*product_line).  Then you can

  1.  Read in the _type_=6 to get total_visits, 
  2.  Then read (in descending VISIT order) the individual VISIT counts, continuously decrementing total_visits
  3.  When you get to r_visit=4 copy the remaining total_visits to visits:
DATA CARE                    ;
  INPUT PRODUCT_LINE$12. CITY$12. ST$3. CLINIC$5. VISITS 4.0;
  LENGTH CITYST $20. ;
  CITYST=CITY||'_'||ST  ;
CARDS  ;
CARDIOLOGY   TEMPLE      TX 0110 200let 
CARDIOLOGY   TEMPLE      TX 0110 300
CARDIOLOGY   LUBBOCK     TX 0110  50
CARDIOLOGY   BUDA        TX 0110 300
CARDIOLOGY   BANDERA     TX 0110 200
CARDIOLOGY   SAN ANTONIO TX 0110  20
CARDIOLOGY   TEMPLE      TX 0110 100
CARDIOLOGY   SAN ANTONIO TX 0110  50
CARDIOLOGY   SAN ANTONIO TX 0110  10
CARDIOLOGY   TEMPLE      TX 0110 100
CARDIOLOGY   TEMPLE      TX 0110 100
RUN  ;
 
PROC SUMMARY data=CARE  ;
  var VISITS ;
  class CLINIC PRODUCT_LINE CITYST  / missing ;
  output out=CARE_TEST3 (drop=_FREQ_) sum=VISITS   ;
run  ;

proc sort data=care_test3 (where=(_type_=6 or _type_=7));
  by clinic product_line _type_ descending visits;
run;

data want (drop=_:);
  set care_test3 ;
  retain _total_visits r_visits;
  if _type_=6 then do;
    _total_visits=visits;
    r_visits=0;
  end;
  if _type_=7;
  r_visits+1;
  if r_visits<=3 then do;
    output;
    _total_visits=_total_visits-visits;
  end;
  else if r_visits=4 then do;
    visits=_total_visits;
    cityst='CARE_ELSEWHERE';
    output;
  end;
run;

Note the PROC SORT reads in only the _TYPE_=6 and _TYPE_=7 records produced by PROC SUMMARY.  And the sort sorts the data by CLINIC/PRODUCT_LINE.  Within each CLINIC/PRODUCT_LINE there will be only one _TYPE_=6 record, followed by the _TYPE_=7 records, which are sorted by descending visit.

 

The sort output if read, and every time there is a _TYPE_=6 record, _TOTAL_VISITS is reset and r_visit is set to 0.  For each _type_=7 record, r_visit is incremented, the data is output (for r_visit<=3), and _TOTAL_VISITS is decremented by VISITS.  So when r_visit=4 is encountered it just outputs the updated _TOTAL_VISTS (copied to VISITS).

 

 

 

Response to "Since VISITS can/will reach in the thousands for multiple CITYST throughout the US, if CITYST is >20+ will the aforementioned code still work?"

 

The code will work no matter how many levels of CITYST (or CLINIC or PRODUCTLINE) is in your data.  The "_TYPE_" variables only designate the "type" of aggregation, not the number of levels a class variable might have.

 

Let's say  you have 3 CLASS variables in proc summary, A, B, and C  (with Na, Nb, and Nc levels - including missing in your case).  Then you can have 2**3=8 "types" of totals/means/other stats.  For instance if you have 

CLASS A B C, then the _type_ levels would be

 

 

  •  _type_=0    Global Totals    (1 observation)
  •  _type_=1    C one-way subtotals  (Nc observations)
  •  _type_=2    B one-way subtotals (Nb obs)
  •  _type_=3   B*C two-way subtotals (Nb*Nc)
  •  _type_=4   A one-way subtotals (Na)
  •  _type_=5  A*C two-way subtotals (Na*Nc)
  • _type_=6    A*B     (Na*Nb obs)
  • _type_=7    A*B*C  (Na*Nb*Nc obs)
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
boricuanalyst
Calcite | Level 5

I need to add two more measure to my input statement (weight & cost).  How can I incorporate the two additional measures into the 'nth sum by-group' code below? The main descending dimension should continue to be  'visits'.

 

DATA CARE                    ;
  INPUT PRODUCT_LINE$12. CITY$12. ST$3. CLINIC$5. VISITS 4.0 WEIGHT 4.0 COST 10.2 ;
  LENGTH CITYST $20. ;
  CITYST=CITY||'_'||ST  ;
CARDS  ;
CARDIOLOGY   TEMPLE      TX 0110 200 1.56 225.00
CARDIOLOGY   TEMPLE      TX 0110 300 1.56 225.00
CARDIOLOGY   LUBBOCK     TX 0110  50 2.00 275.00
CARDIOLOGY   BUDA        TX 0110 300 1.23 175.00
CARDIOLOGY   BANDERA     TX 0110 200 2.01 250.00
CARDIOLOGY   SAN ANTONIO TX 0110  20 2.89 375.00
CARDIOLOGY   TEMPLE      TX 0110 100 1.56 225.00
CARDIOLOGY   SAN ANTONIO TX 0110  50 2.89 375.00
CARDIOLOGY   SAN ANTONIO TX 0110  10 2.89 375.00
CARDIOLOGY   TEMPLE      TX 0110 100 1.56 225.00
CARDIOLOGY   TEMPLE      TX 0110 100 1.56 225.00
RUN  ;

 

boricuanalyst
Calcite | Level 5
I need to add two more measure to my input statement (weight & cost). How can I incorporate the two additional measures into the 'nth sum by-group' code? The main descending dimension should continue to be 'visits'.

DATA CARE ;
INPUT PRODUCT_LINE$12. CITY$12. ST$3. CLINIC$5. VISITS 4.0 WEIGHT 4.0 COST 10.2 ;
LENGTH CITYST $20. ;
CITYST=CITY||'_'||ST ;
CARDS ;
CARDIOLOGY TEMPLE TX 0110 200 1.56 225.00
CARDIOLOGY TEMPLE TX 0110 300 1.56 225.00
CARDIOLOGY LUBBOCK TX 0110 50 2.00 275.00
CARDIOLOGY BUDA TX 0110 300 1.23 175.00
CARDIOLOGY BANDERA TX 0110 200 2.01 250.00
CARDIOLOGY SAN ANTONIO TX 0110 20 2.89 375.00
CARDIOLOGY TEMPLE TX 0110 100 1.56 225.00
CARDIOLOGY SAN ANTONIO TX 0110 50 2.89 375.00
CARDIOLOGY SAN ANTONIO TX 0110 10 2.89 375.00
CARDIOLOGY TEMPLE TX 0110 100 1.56 225.00
CARDIOLOGY TEMPLE TX 0110 100 1.56 225.00
RUN ;

PROC SUMMARY data=CARE ;
var VISITS ;
class CLINIC PRODUCT_LINE CITYST / missing ;
output out=CARE_TEST3 (drop=_FREQ_) sum=VISITS ;
run ;

proc sort data=care_test3 (where=(_type_=6 or _type_=7));
by clinic product_line _type_ descending visits;
run;

data want (drop=_:);
set care_test3 ;
retain _total_visits r_visits;
if _type_=6 then do;
_total_visits=visits;
r_visits=0;
end;
if _type_=7;
r_visits+1;
if r_visits<=3 then do;
output;
_total_visits=_total_visits-visits;
end;
else if r_visits=4 then do;
visits=_total_visits;
cityst='CARE_ELSEWHERE';
output;
end;
run;
mkeintz
PROC Star

By "incorporate the two additional measures into the 'nth sum by-group' code", do you mean that you want the sum of the WEIGHT and COST for each of the locations with the most visits?

 

If so, why not add the WEIGHT and COST to the VAR statement in proc summary, and also adjust the OUTPUT statement?  You can review the documentation on the OUTPUT statement in PROC SUMMARY to see how to get WEIGHT and COST to hold totals.

 

Then in the DATA WANT step, create and manage the variables _total_weight and _total_cost the same way as you manage _total_visits.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
boricuanalyst
Calcite | Level 5

Apologies for my lack of specificity.  I did tried several versions of adding the new measure 'PAID' (same as COST) to the SAS code (VAR and OUTPUT statement) but the summation of PAID appears to be inaccurate.  In PROC SORT 2 (see below results) amount PAID for Cardiology CARE_ELSEWHERE should be 1400, not 1125.  First, I tried sum=VISITS sum=PAID ; run ; ... but PAID seemed to copy over exact values as VISITS.  Secondly, I tried sum= ; run; ... this version worked best but PAID summation for CARE_ELSEWHERE is suspect.  What am I doing wrong?  Thanks in advance!

 

DATA HOLLYWOODSQ ;

INPUT PRODUCT_LINE$12. CITY$12. ST$3. CLINIC$5. VISITS 4.0 PAID 10.2 ;
LENGTH CITYST $20. ;

CITYST=CATS(CITY,'_',ST) ;

 

CARDS ;
CARDIOLOGY TEMPLE              TX 0110 200 225.00
DERMATOLOGY AUSTIN            TX 0110 200 300.00
INTERNAL MED SAN ANTONIO TX 0110 200 375.00
CARDIOLOGY TEMPLE               TX 0110 300 225.00
DERMATOLOGY AUSTIN            TX 0110 200 300.00
INTERNAL MED AUSTIN             TX 0110 500 300.00
CARDIOLOGY LUBBOCK            TX 0110   50 275.00
DERMATOLOGY SAN ANTONIO TX 0110 400 375.00
INTERNAL MED LUBBOCK         TX 0110 100 275.00
CARDIOLOGY BUDA                   TX 0110 300 175.00
DERMATOLOGY AUSTIN            TX 0110 200 300.00
INTERNAL MED SAN ANTONIO TX 0110 200 375.00
CARDIOLOGY BANDERA            TX 0110 200 250.00
DERMATOLOGY CIBOLO            TX 0110 200 250.00
INTERNAL MED SAN JOSE        TX 0110 200 150.00
CARDIOLOGY SAN ANTONIO    TX 0110   20 375.00
DERMATOLOGY AUSTIN            TX 0110 200 300.00
INTERNAL MED SAN ANTONIO TX 0110   20 375.00
CARDIOLOGY TEMPLE               TX 0110 100 225.00
DERMATOLOGY TEMPLE           TX 0110 100 225.00
INTERNAL MED TEMPLE            TX 0110 100 225.00
CARDIOLOGY SAN ANTONIO     TX 0110   50 375.00
CARDIOLOGY SAN ANTONIO     TX 0110   10 375.00
INTERNAL MED AUSTIN             TX 0110   50 300.00
INTERNAL MED AUSTIN             TX 0110 200 300.00
DERMATOLOGY AUSTIN            TX 0110 100 300.00
DERMATOLOGY AUSTIN            TX 0110 100 300.00
CARDIOLOGY TEMPLE              TX 0110 100 300.00
CARDIOLOGY TEMPLE              TX 0110 100 225.00
;
RUN ;

PROC SUMMARY data=HOLLYWOODSQ ;
var VISITS PAID;
class CLINIC PRODUCT_LINE CITYST / missing ;
output out=CARE_TEST3 (drop=_FREQ_) sum= ;
run ;

proc sort data=care_test3 (where=(_type_=6 or _type_=7));
by clinic product_line _type_ descending visits;
run;

PROC PRINT DATA=CARE_TEST3 ;
TITLE 'PROC SORT 1' ;
RUN ;

data want (drop=_:);
set care_test3 ;
retain _total_visits r_visits;
if _type_=6 then do;
_total_visits=visits;
r_visits=0;
end;
if _type_=7;
r_visits+1;
if r_visits<=3 then do;
output;
_total_visits=_total_visits-visits;
end;
else if r_visits=4 then do;
visits=_total_visits;
cityst='CARE_ELSEWHERE';
output;
end;
run;

PROC PRINT DATA=WANT ;
TITLE 'PROC SORT 2' ;
RUN ;

 

 

PROC SORT 1

Obs CLINIC    PRODUCT_LINE    CITYST            _TYPE_    VISITS    PAID

 

  1     0110     CARDIOLOGY                                          6       1430    3025

  2     0110     CARDIOLOGY      TEMPLE_TX               7        800     1200

  3     0110     CARDIOLOGY      BUDA_TX                   7        300       175

  4     0110     CARDIOLOGY      BANDERA_TX            7        200       250

  5     0110     CARDIOLOGY      SAN ANTONIO_TX     7         80      1125

  6     0110     CARDIOLOGY      LUBBOCK_TX            7         50        275

  7     0110     DERMATOLOGY                                      6       1700     2650

  8     0110     DERMATOLOGY     AUSTIN_TX             7       1000     1800

  9     0110     DERMATOLOGY     SAN ANTONIO_TX  7        400        375

 10     0110     DERMATOLOGY     CIBOLO_TX            7        200        250

 11     0110     DERMATOLOGY     TEMPLE_TX            7        100        225

 12     0110     INTERNAL MED                                      6       1570     2675

 13     0110     INTERNAL MED    AUSTIN_TX              7        750        900

 14     0110     INTERNAL MED    SAN ANTONIO_TX   7        420      1125

 15     0110     INTERNAL MED    SAN JOSE_TX          7        200        150

 16     0110     INTERNAL MED    LUBBOCK_TX          7        100        275

 17     0110     INTERNAL MED    TEMPLE_TX             7        100        225

 

PROC SORT 2

 

Obs    CLINIC    PRODUCT_LINE    CITYST            VISITS    PAID    r_visits

 

  1     0110     CARDIOLOGY      TEMPLE_TX                 800     1200        1  

  2     0110     CARDIOLOGY      BUDA_TX                     300       175        2  

  3     0110     CARDIOLOGY      BANDERA_TX              200       250        3  

  4     0110     CARDIOLOGY      CARE_ELSEWHERE   130      1125        4  

  5     0110     DERMATOLOGY     AUSTIN_TX              1000     1800        1  

  6     0110     DERMATOLOGY     SAN ANTONIO_TX    400       375        2  

  7     0110     DERMATOLOGY     CIBOLO_TX               200       250        3  

  8     0110     DERMATOLOGY     CARE_ELSEWHERE 100       225        4  

  9     0110     INTERNAL MED    AUSTIN_TX                  750      900        1  

 10     0110     INTERNAL MED    SAN ANTONIO_TX      420    1125        2  

 11     0110     INTERNAL MED    SAN JOSE_TX             200      150        3  

 12     0110     INTERNAL MED    CARE_ELSEWHERE   200      275        4  

 

mkeintz
PROC Star

I think you overlooked this part of my response:

 

Then in the DATA WANT step, create and manage the variables _total_weight and _total_cost the same way as you manage _total_visits.

Don't you intend to do the same for WEIGHT and COST as you did for VISITs - namely take a total and then keep subtracting each subtotal maintain a residual, to be assigned when r_visits=4?

 

Also, I can't get your initial DATA step to work - perhaps you've collapsed tab characters in the data lines, which defeats the field alignments implied by the input formats.  I'm too lazy to fix each data line one at a time.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
boricuanalyst
Calcite | Level 5

Thank you, I am finally able to find time to get back to testing/coding my program (too much distraction at work).  As recommended, I added the necessary 'measure' (PAID) code and the program worked like a charm ..thank you!  I have one last tweak however!  I must add one more 'variable' (LOCATION) because I realized I need to process 'PCwant' calculation only for those healthcare records that occur outside Market-A. To flag outside records, I added the necessary code (LOCATION/PSA); however, I get back no observations for PCwant when I execute my code.  What am I missing? … attached is my program.

boricuanalyst
Calcite | Level 5

OMG, I finally figure out the issue.  I went back and ready your notes dated 10/14/2019.  It's working now, please disregard need for further assistance.

 

thanks!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1305 views
  • 1 like
  • 2 in conversation