Hi all,
I am working on the left data set (code used to create this is attached to bottom). My objective is to pick one MG, one inc and one CIS only using the logic below:
- I want the to pick the CIS with the highest inc (income) within MG
- If CIS is missing, then pick the first record with a non-missing CIS in the MG
- else if CIS is present, pick the first record which has a CIS.
If the logic is applied correctly , I should end up with a data set on the right.
To accomplish this, I tried using the code below:
proc sort data=nn;
by MG Descending Inc CIS;
run;
data nnn1;
set work.nn;
by MG descending inc CIS;
retain identifier;
if first.MG then identifier=0;
if identifier=0 and not missing(CIS) then do;
output;
identifier=1;
end;
run;
But what I got back was this table
Please can anyone help?
Thanks.
Code to create data:
data nn;
input MG Inc CIS;
lines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
2 350 .
4 . 23
4 330 23
5 190 23
5 456 .
6 838 .
6 345 .
;
With an assumption a minor adjustment while I wait for your response:
data nn;
input MG Inc CIS;
datalines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
3 350 .
4 . 23
4 -1 23
5 0 23
5 2 .
6 838 .
6 345 .
;
data want;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if first.mg then call missing(_max);
if not missing(cis) then _max=max(_max,_inc);
end;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if not missing(_max) and _max=_inc then output;
else if missing(_max) and first.mg then output;
end;
drop _:;
run;
Results:
MG | Inc | CIS |
1 | 45 | 12 |
2 | 123 | 2 |
3 | 129 | 10 |
4 | 23 | |
5 | 0 | 23 |
6 | 838 |
Hi all,
I am working on the left data set (code used to create this is attached to bottom). My objective is to pick one MG, one inc and one CIS only using the logic below:
- I want the to pick the CIS with the highest inc (income) within MG
- If CIS is missing, then pick the first record with a non-missing CIS in the MG
- else if CIS is present, pick the first record which has a CIS.
If the logic is applied correctly , I should end up with a data set on the right.
Please can anyone help?
Thanks.
Code to create data:
data nn;
input MG Inc CIS;
lines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
2 350 .
4 . 23
4 330 23
5 190 23
5 456 .
6 838 .
6 345 .
;
data nn;
input MG Inc CIS;
datalines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
3 350 .
4 . 23
4 330 23
5 190 23
5 456 .
6 838 .
6 345 .
;
data want;
do until(last.MG);
set nn;
by MG;
if first.mg then call missing(max);
if not missing(cis) then max=max(max,inc);
end;
do until(last.MG);
set nn;
by MG;
if not missing(max) and max=inc then output;
else if missing(max) and first.mg then output;
end;
drop max;
run;
Thanks Novinosrin. That does the job. Is there a book or paper you could point me to that will help explain this further please?
Google SAS tutorials, Youtube, SAS sugi papers, training materials, SAS with examples book by author @Ron_Cody is very helpful. All the best!
Hi novinosrin,
Thanks for the suggestions.
I have been able to flaw you code. If negative income income value for MG 4 (data step to generate code provided) and the code falls into pieces. My guess is that your logic works correctly when every income value is positive
data nn;
input MG Inc CIS;
lines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
2 350 .
4 . 23
4 -1 23
5 0 23
5 2 .
6 838 .
6 345 .
;
Hi @ClareRobin1 Can you please post me your expected output sample for your latest input sample so that I can modify my code and test properly? Thanks
With an assumption a minor adjustment while I wait for your response:
data nn;
input MG Inc CIS;
datalines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
3 350 .
4 . 23
4 -1 23
5 0 23
5 2 .
6 838 .
6 345 .
;
data want;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if first.mg then call missing(_max);
if not missing(cis) then _max=max(_max,_inc);
end;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if not missing(_max) and _max=_inc then output;
else if missing(_max) and first.mg then output;
end;
drop _:;
run;
Results:
MG | Inc | CIS |
1 | 45 | 12 |
2 | 123 | 2 |
3 | 129 | 10 |
4 | 23 | |
5 | 0 | 23 |
6 | 838 |
Thanks novinosrin. You've cracked the puzzle. Thanks a million
My pleasure. However, I will continue to be open to change/edit/modification should you require any because on the forums we only play with samples and thus there may be unwanted surprises. Feel free to have any follow up questions if you may have. Have a good one! Cheers
Please can you put dont just a one line not on what the function - call missing (_Max) does?
I have broken the code into pieces to try and understand that but still cant figure out what it does. Since I am going to use this code at work, it might be worth me knowing what that function does
Here are the notes:
1. The solution loops through the entire dataset twice processing one by group at a time
2. On the 1st pass, we calculate the max for each by group. I guess this is self explanatory
3. The call missing assigns a missing value to erase the max of the previous by group that is retained in memory. This helps in avoiding the mess while processing the next by group
4. Once we have completed the 1st pass, the 2nd pass basically filters the records using the max value that was obtained in the first pass and with your stated conditions
5. The explicit output statement overrides the implicit output as it is very important to control for the reason the processing of the dataset by groups using Dow loop happens in memory and implicit output by default happens only when sas processes the last observation of the by group.
Hope this helps
PS Do get book by Ron cody/ Art Carpenter without fail if you really want to speed up your learning
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.