BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
km0927
Obsidian | Level 7

Hello. it seems like there was a misunderstanding, so i have detailed the previous question.

(https://communities.sas.com/t5/SAS-Programming/creating-time-dependent-variables/m-p/498060#M132241)

 

the raw data is here.


id timeorder clinic
1 1 a
1 2 a
1 3 b
1 4 b
1 5 a
1 6 c

 

and with 'retain function', 'by variables' and 'square', i made cumulative sum and square.
(i used proc sort by id and clinic, and re-sort by timeorder)

 

id timeorder clinic total totalsquare
1 1 a 1 1
1 2 a 2 4
1 3 b 1 1
1 4 b 2 4
1 5 a 3 9
1 6 c 1 1

 

and i want to make sum of squares of the largest values for each group like this.

 

id timeorder clinic total totalsquare sum
1 1 a 1 1 1
1 2 a 2 4 4
1 3 b 1 1 5 *1(b)+4(a)*
1 4 b 2 4 8 *4(b)+4(a)*
1 5 a 3 9 13 *4(b)+9(a)*
1 6 c 1 1 14 *1(c)+4(b)+9(a)*

 

I tried 'retain totalsquare', but it turned out...

 

1 1 a 1 1 1
1 2 a 2 4 4
1 3 b 1 1 6 *1+1+4*
1 4 b 2 4 10 *1+1+4+4+4*
1 5 a 3 9 19 *1+1+4+4+4+9*
1 6 c 1 1 20 *1+1+4+4+4+9+1*

 

how can i replace existing value in cumulative sum?

 

*one more thing, i thought of creating virtual column. here is example.

 

id timeorder clinic count sqaure sum a b c

1 1 a 1 1 1 1 0 0
1 2 a 2 4 4 4 0 0
1 3 b 1 1 5 4 1 0
1 4 b 2 4 8 4 4 0
1 5 a 3 9 13 9 4 0
1 6 c 1 1 14 9 4 1

 

'a', 'b', 'c' column means the largest value of each clinic by far. however, if the number of clinic becomes very high, then it is difficult to make virtual columns one by one. do i have to use 'array' function?

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16
data have;
input id timeorder clinic:$ total totalsquare;
cards;
1 1 a 1 1
1 2 a 2 4
1 3 b 1 1
1 4 b 2 4
1 5 a 3 9
1 6 c 1 1
;

data want;
retain a b c;
set have;
by id notsorted;
array alp(3) a b c;
array alpc(3)$ ('a' 'b' 'c');
do i = 1 to 3;
if first.id then alp(i)=.;
if totalsquare ne . and clinic=alpc(i) then alp(i)=totalsquare;
end;
sum=sum(of alp(*));
run;
Thanks,
Jag

View solution in original post

17 REPLIES 17
Jagadishkatam
Amethyst | Level 16
data have;
input id timeorder clinic:$ total totalsquare;
cards;
1 1 a 1 1
1 2 a 2 4
1 3 b 1 1
1 4 b 2 4
1 5 a 3 9
1 6 c 1 1
;

data want;
retain a b c;
set have;
by id notsorted;
array alp(3) a b c;
array alpc(3)$ ('a' 'b' 'c');
do i = 1 to 3;
if first.id then alp(i)=.;
if totalsquare ne . and clinic=alpc(i) then alp(i)=totalsquare;
end;
sum=sum(of alp(*));
run;
Thanks,
Jag
km0927
Obsidian | Level 7

i really appreciate your help.

 

if number the clinics  becomes very high(like a, b, c, d, e, f, g, ....), or types of clinics becomes very complex(like a, b, d, g, h, x....)

 

is there another way to code array?

 

if the raw data is

 

id clinics

 

1 a

1 a

1 a

1 b

1 b

2 d

2 e

2 k

3 a

3 z

 

maybe, it is possible to code

 

array{*} _character_

do i= 1 to dim(alp)

 

is it right?

 

Jagadishkatam
Amethyst | Level 16

please try the robust approach with macro variables

 

%put &clinic1; %put &clinic2;

 

data have;
input id timeorder clinic:$ total totalsquare;
cards;
1 1 a 1 1
1 2 a 2 4
1 3 b 1 1
1 4 b 2 4
1 5 a 3 9
1 6 c 1 1
;

proc sql noprint;
select distinct clinic into: clinic1 separated by ' ' from have;
select distinct cats("'",clinic,"'") into: clinic2 separated by ' ' from have;
quit;
%put &clinic1;
%put &clinic2;

data want;
retain a b c;
set have;
by id notsorted;
array alp(*) &clinic1;
array alpc(&sqlobs)$ (&clinic2.);
do i = 1 to dim(alp);
if first.id then alp(i)=.;
if totalsquare ne . and clinic=alpc(i) then alp(i)=totalsquare;
end;
sum=sum(of alp(*));
run;
Thanks,
Jag
km0927
Obsidian | Level 7

I'm sorry, but I'll ask you one more question.

 

in my first question, clinic was character variables.

 

however, when clinic is numeric variable, it doesn't work.

 

data have;
input id timeorder clinic:$ total totalsquare;
cards;
1 1 1 1 1
1 2 1 2 4
1 3 2 1 1
1 4 2 2 4
1 5 1 3 9
1 6 3 1 1
;

(and the number of clinics in original data is more than 20 thousand)

 

i used

 

data want;

retain &clinic1;

 

the next is same as your code.

 

when clinic is character variable, it worked.

 

but when clinic is numeric variable, it doesn't work.

 

i think SAS doesn't recognize variable when it's name is numeric.

 

maybe i have to modify original data's clinic name into character. 

 

i would appreciate if you let me know if i was right.

PaigeMiller
Diamond | Level 26

The idea of finding maximums in a group using and then doing cumulative sums all in one data step really makes for some complicated programming, and possibility of error.

 

If you take the mindset that you can do this in two steps (which of course you can), the whole process gets easier by a lot.

 

PROC SUMMARY can easily find the largest value in each group. Then a data step can easily give you cumulative sums.

--
Paige Miller
km0927
Obsidian | Level 7

i tried PROC SUMMARY, but it doesn't work cumulatively. it only finds 'maximum' value of group.

 

is there a way to use 'PROC SUMMARY' cumulatively?

PaigeMiller
Diamond | Level 26

@km0927 wrote:

i tried PROC SUMMARY, but it doesn't work cumulatively. it only finds 'maximum' value of group.

 

is there a way to use 'PROC SUMMARY' cumulatively?


That's not what I said. I said PROC SUMMARY will find the max, and then a data step will find the cumulative sum. And this will be more readable and maintainable code than the solutions which involve only data steps.

 

I look at the other solutions posted here, and I hope I will never have to read that code or maintain it, those are difficult to understand without excessive commenting, even if they do produce the correct result.

 

Sometimes you need to stop thinking of having the entire solution in one SAS step.

--
Paige Miller
novinosrin
Tourmaline | Level 20

hi @km0927 Do you want a 20 thousand variables wide dataset in your output? Can you post the output you require?

novinosrin
Tourmaline | Level 20

20 unique clinic numeric names in a wide dataset doesn't make sense, however the sum does though.

 

data have;
input id timeorder clinic  total totalsquare;
cards;
1 1 1 1 1
1 2 1 2 4
1 3 2 1 1
1 4 2 2 4
1 5 1 3 9
1 6 3 1 1
;

data _null_;
call symputx('n',nobs);
if 0 then set have nobs=nobs;
stop;
run;

%put &n;

data want;
do until(lr);
set have end=lr;
array t(&n) _temporary_;
if not missing(clinic) and clinic not in t  then do;
n+1;
t(n)=clinic;
end;
end;
lr=0;
do until(lr);
set have end=lr;
array j(&n) _temporary_;
j(whichn(clinic,of t(*)))=totalsquare;
sum=sum(of j(*));
output;
end;
drop n;
run;

 

Notes:

1. Always better to have long dataset than a wide dataset, at least in SAS for most computations

2. Macros are not robust to populate wide strings aka var names in incremental dataset. 

 

km0927
Obsidian | Level 7

i really appreciate your help. i will provide further explanation of original data.

 

the original data is patient's billing data. and it looks like...(i converted 'date' to time order in the first question)

 

id date clinic

1 20020408 12135

1 20020715 12135

1 20030124 15109

1 20030308 75423

1 20030505 75423

1 20030607 75423

1 20030815 12135

2 20020615 25875

2 20020717 25875

2 20020824 55415

2 20020907 25875

2 20021215 25875

2 20030108 25875

 

clinic variable is five-digit numeric variable, and in original dataset, it is more than 20000 kinds of clinic.

 

I also think 'more than 20000 columns' doesn't make sense, too. it is too wide.

 

so i convert each clinic's kind in numerical order by id.

 

id date clinic 

1 20020408 1

1 20020715 1

1 20030124 2

1 20030308 3

1 20030505 3

1 20030607 3

1 20030815 1

2 20020615 1

2 20020717 1

2 20020824 2

2 20020907 3

2 20021215 3

2 20030108 3

 

with your code and 'by id' statement, i think i can make it possible to create cumulative sum in that data. thank you.

novinosrin
Tourmaline | Level 20

And can you post a sample of you WANT (desired output) for the latest input sample

km0927
Obsidian | Level 7

id date clinic count countsquare sum


1 20020408 12135 1 1 1
1 20020715 12135 2 4 4
1 20030124 15109 1 1 5 /*4+1*/
1 20030308 75423 1 1 6 /*4+1+1*/
1 20030505 75423 2 4 9 /*4+1+4*/
1 20030607 75423 3 9 14 /*4+1+9*/
1 20030815 12135 3 9 19 /*9+1+9*/
2 20020615 25875 1 1 1
2 20020717 25875 2 4 4
2 20020824 55415 1 1 5 /*4+1*/
2 20020907 25875 3 9 10 /*9+1*/
2 20021215 25875 4 16 17 /*16+1*/
2 20030108 25875 5 25 26 /*25+1*/

 

count means visiting number of each clinic.

 

and sum means largest value of each clinic's square.

 

for example, in third row

 

4 comes from clinic 12135's countsquare.
1 comes from clinic 15109's countsquare.

 

in seventh row


9 comes from clinic 12135's countsquare.
1 comes from clinic 15109's countsquare.
9 comes from clinic 75423's countsquare.

 

i really appreciate your help. thank a lot.

Jagadishkatam
Amethyst | Level 16

I updated my previous code to the new data 

 

data have;
input id timeorder clinic total totalsquare;
cards;
1 20020408 12135 1 1 
1 20020715 12135 2 4 
1 20030124 15109 1 1 
1 20030308 75423 1 1 
1 20030505 75423 2 4 
1 20030607 75423 3 9 
1 20030815 12135 3 9 
2 20020615 25875 1 1 
2 20020717 25875 2 4 
2 20020824 55415 1 1 
2 20020907 25875 3 9 
2 20021215 25875 4 16
2 20030108 25875 5 25
;

proc sql noprint;
select distinct cats('_',clinic) into: clinic1 separated by ' ' from have;
select distinct cats(clinic) into: clinic2 separated by ' ' from have;
quit;
%put &clinic1;
%put &clinic2;

options symbolgen;
data want;
retain &clinic1.;
set have;
by id notsorted;
array alp(*) &clinic1.;
array alpc(&sqlobs) (&clinic2.);
do i = 1 to dim(alp);
if first.id then alp(i)=.;
if totalsquare ne . and clinic=alpc(i) then alp(i)=totalsquare;
end;
sum=sum(of alp(*));
run;
Thanks,
Jag
novinosrin
Tourmaline | Level 20

@km0927 Thank you. I'm just acknowledging your post as my night passes by with no access to sas software at home.. It looks like you have another solution to play with and test to see how well that works. 

 

So let know how the other latest solution works for you and should it not meet your requirement and if you still haven't received any workable solution by the morning(my morning), I will post mine. Good night!  & take care!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2864 views
  • 6 likes
  • 5 in conversation