## By group processing

Solved
Occasional Contributor
Posts: 7

# By group processing

Hi all,

I am working on the left data set (code used to create this is attached to bottom). My objective is to pick one MG, one inc and one CIS only using the logic below:

- I want the to pick the CIS with the highest inc (income) within MG

- If CIS is missing, then pick the first record with a non-missing CIS in the MG

- else if CIS is present, pick the first record which has a CIS.

If the logic is applied correctly , I should end up with a data set on the right.

To accomplish this, I tried using the code below:

proc sort data=nn;
by MG Descending Inc CIS;
run;

data nnn1;
set work.nn;
by MG descending inc CIS;
retain identifier;
if first.MG then identifier=0;
if identifier=0 and not missing(CIS) then do;
output;
identifier=1;
end;
run;

But what I got back was this table

Please can anyone help?

Thanks.

Code to create data:

data nn;
input MG Inc CIS;
lines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
2 350 .
4 . 23
4 330 23
5 190 23
5 456 .
6 838 .
6 345 .
;

Accepted Solutions
Solution
‎04-05-2018 05:27 PM
PROC Star
Posts: 1,833

## Re: By group processing

Posted in reply to novinosrin

With an assumption a minor adjustment while I wait for your response:

``````data nn;
input MG Inc CIS;
datalines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
3 350 .
4 . 23
4 -1 23
5 0 23
5 2 .
6 838 .
6 345 .
;

data want;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if first.mg then call missing(_max);
if not missing(cis) then _max=max(_max,_inc);
end;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if  not missing(_max) and _max=_inc then output;
else if missing(_max) and first.mg then output;
end;
drop _:;
run;``````

Results:

 MG Inc CIS 1 45 12 2 123 2 3 129 10 4 23 5 0 23 6 838

All Replies
Occasional Contributor
Posts: 7

## By group processing

Posted in reply to ClareRobin1

Hi all,

I am working on the left data set (code used to create this is attached to bottom). My objective is to pick one MG, one inc and one CIS only using the logic below:

- I want the to pick the CIS with the highest inc (income) within MG

- If CIS is missing, then pick the first record with a non-missing CIS in the MG

- else if CIS is present, pick the first record which has a CIS.

If the logic is applied correctly , I should end up with a data set on the right.

Please can anyone help?

Thanks.

Code to create data:

data nn;
input MG Inc CIS;
lines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
2 350 .
4 . 23
4 330 23
5 190 23
5 456 .
6 838 .
6 345 .
;

PROC Star
Posts: 1,833

## Re: By group processing

Posted in reply to ClareRobin1
``````data nn;
input MG Inc CIS;
datalines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
3 350 .
4 . 23
4 330 23
5 190 23
5 456 .
6 838 .
6 345 .
;
data want;
do until(last.MG);
set nn;
by MG;
if first.mg then call missing(max);
if not missing(cis) then max=max(max,inc);
end;
do until(last.MG);
set nn;
by MG;
if  not missing(max) and max=inc then output;
else if missing(max) and first.mg then output;
end;
drop max;
run;``````
Occasional Contributor
Posts: 7

## Re: By group processing

Posted in reply to novinosrin

Thanks Novinosrin. That does the job. Is there a book or paper you could point me to that will help explain this further please?

PROC Star
Posts: 1,833

## Re: By group processing

Posted in reply to ClareRobin1

Google SAS tutorials, Youtube, SAS sugi papers, training materials, SAS with examples book by author @Ron_Cody is very helpful. All the best!

Occasional Contributor
Posts: 7

## Re: By group processing

Posted in reply to novinosrin

Hi

PROC Star
Posts: 1,833

## Re: By group processing

Posted in reply to ClareRobin1

Hi @ClareRobin1 Can you please post me your expected output sample for your latest input sample so that I can modify my code and test properly? Thanks

Solution
‎04-05-2018 05:27 PM
PROC Star
Posts: 1,833

## Re: By group processing

Posted in reply to novinosrin

With an assumption a minor adjustment while I wait for your response:

``````data nn;
input MG Inc CIS;
datalines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
3 350 .
4 . 23
4 -1 23
5 0 23
5 2 .
6 838 .
6 345 .
;

data want;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if first.mg then call missing(_max);
if not missing(cis) then _max=max(_max,_inc);
end;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if  not missing(_max) and _max=_inc then output;
else if missing(_max) and first.mg then output;
end;
drop _:;
run;``````

Results:

 MG Inc CIS 1 45 12 2 123 2 3 129 10 4 23 5 0 23 6 838
Occasional Contributor
Posts: 7

## Re: By group processing

Posted in reply to novinosrin

Thanks

PROC Star
Posts: 1,833

## Re: By group processing

Posted in reply to ClareRobin1

My pleasure. However, I will continue to be open to change/edit/modification should you require any because on the forums we only play with samples and thus there may be unwanted surprises. Feel free to have any follow up questions if you may have.  Have a good one! Cheers

Occasional Contributor
Posts: 7

## Re: By group processing

Posted in reply to ClareRobin1

Please can you put dont just a one line not on what the function - call missing (_Max) does?

I have broken the code into pieces to try and understand that but still cant figure out what it does. Since I am going to use this code at work, it might be worth me knowing what that function does

PROC Star
Posts: 1,833

## Re: By group processing

Posted in reply to ClareRobin1

Here are the notes:

1. The solution loops through the entire dataset twice processing one by group at a time

2. On the 1st pass, we calculate the max for each by group. I guess this is self explanatory

3. The call missing assigns a missing value to erase the max of the previous by group that is retained in memory. This helps in avoiding the mess while processing the next by group

4. Once we have completed the 1st pass, the 2nd pass basically filters the records using the max value that was obtained in the first pass and with your stated conditions

5. The explicit output statement overrides the implicit output as it is very important to control for the reason the processing of the dataset by groups using Dow loop happens in memory and implicit output by default happens only when sas processes the last observation of the by group.

Hope this helps

PS Do get book by Ron cody/ Art Carpenter without fail if you really want to speed up your learning

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
• 11 replies
• 201 views
• 1 like
• 2 in conversation