BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ClareRobin1
Calcite | Level 5

Hi all,

 

I am working on the left data set (code used to create this is attached to bottom). My objective is to pick one MG, one inc and one CIS only using the logic below:

- I want the to pick the CIS with the highest inc (income) within MG

- If CIS is missing, then pick the first record with a non-missing CIS in the MG

- else if CIS is present, pick the first record which has a CIS.

Screen Shot 2018-04-04 at 19.26.55.pngScreen Shot 2018-04-04 at 19.35.48.png

If the logic is applied correctly , I should end up with a data set on the right.

 

To accomplish this, I tried using the code below:

 

proc sort data=nn;
by MG Descending Inc CIS;
run;

 

data nnn1;
set work.nn;
by MG descending inc CIS;
retain identifier;
if first.MG then identifier=0;
if identifier=0 and not missing(CIS) then do;
output;
identifier=1;
end;
run;

 

But what I got back was this table

 

Screen Shot 2018-04-04 at 19.42.30.png

 

 

Please can anyone help?

 

Thanks.

 

Code to create data:

 

data nn;
input MG Inc CIS;
lines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
2 350 .
4 . 23
4 330 23
5 190 23
5 456 .
6 838 .
6 345 .
;

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

With an assumption a minor adjustment while I wait for your response:

 

data nn;
input MG Inc CIS;
datalines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
3 350 .
4 . 23
4 -1 23
5 0 23
5 2 .
6 838 .
6 345 .
;

data want;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if first.mg then call missing(_max);
if not missing(cis) then _max=max(_max,_inc);
end;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if  not missing(_max) and _max=_inc then output;
else if missing(_max) and first.mg then output;
end;
drop _:;
run;

Results:

 

MGIncCIS
14512
21232
312910
4 23
5023
6838 

View solution in original post

11 REPLIES 11
ClareRobin1
Calcite | Level 5

Hi all,

 

I am working on the left data set (code used to create this is attached to bottom). My objective is to pick one MG, one inc and one CIS only using the logic below:

- I want the to pick the CIS with the highest inc (income) within MG

- If CIS is missing, then pick the first record with a non-missing CIS in the MG

- else if CIS is present, pick the first record which has a CIS.

Screen Shot 2018-04-04 at 19.26.55.pngScreen Shot 2018-04-04 at 19.35.48.png

If the logic is applied correctly , I should end up with a data set on the right.

 

Please can anyone help?

 

Thanks.

 

Code to create data:

 

data nn;
input MG Inc CIS;
lines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
2 350 .
4 . 23
4 330 23
5 190 23
5 456 .
6 838 .
6 345 .
;

 

novinosrin
Tourmaline | Level 20
data nn;
input MG Inc CIS;
datalines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
3 350 .
4 . 23
4 330 23
5 190 23
5 456 .
6 838 .
6 345 .
;
data want;
do until(last.MG);
set nn;
by MG;
if first.mg then call missing(max);
if not missing(cis) then max=max(max,inc);
end;
do until(last.MG);
set nn;
by MG;
if  not missing(max) and max=inc then output;
else if missing(max) and first.mg then output;
end;
drop max;
run;
ClareRobin1
Calcite | Level 5

Thanks Novinosrin. That does the job. Is there a book or paper you could point me to that will help explain this further please?

novinosrin
Tourmaline | Level 20

Google SAS tutorials, Youtube, SAS sugi papers, training materials, SAS with examples book by author @Ron_Cody is very helpful. All the best!

ClareRobin1
Calcite | Level 5

Hi 

 

 

 

novinosrin
Tourmaline | Level 20

Hi @ClareRobin1 Can you please post me your expected output sample for your latest input sample so that I can modify my code and test properly? Thanks

novinosrin
Tourmaline | Level 20

With an assumption a minor adjustment while I wait for your response:

 

data nn;
input MG Inc CIS;
datalines;
1 2345 .
1 45 12
1 23 12
2 33 23
2 500 .
2 300 .
2 123 2
3 . 10
3 129 10
3 350 .
4 . 23
4 -1 23
5 0 23
5 2 .
6 838 .
6 345 .
;

data want;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if first.mg then call missing(_max);
if not missing(cis) then _max=max(_max,_inc);
end;
do until(last.MG);
set nn;
by MG;
_inc=ifn(inc=.,0,inc);
if  not missing(_max) and _max=_inc then output;
else if missing(_max) and first.mg then output;
end;
drop _:;
run;

Results:

 

MGIncCIS
14512
21232
312910
4 23
5023
6838 
ClareRobin1
Calcite | Level 5

Thanks 

novinosrin
Tourmaline | Level 20

My pleasure. However, I will continue to be open to change/edit/modification should you require any because on the forums we only play with samples and thus there may be unwanted surprises. Feel free to have any follow up questions if you may have.  Have a good one! Cheers

ClareRobin1
Calcite | Level 5

Please can you put dont just a one line not on what the function - call missing (_Max) does?

 

I have broken the code into pieces to try and understand that but still cant figure out what it does. Since I am going to use this code at work, it might be worth me knowing what that function does

novinosrin
Tourmaline | Level 20

Here are the notes:

1. The solution loops through the entire dataset twice processing one by group at a time

2. On the 1st pass, we calculate the max for each by group. I guess this is self explanatory 

3. The call missing assigns a missing value to erase the max of the previous by group that is retained in memory. This helps in avoiding the mess while processing the next by group

4. Once we have completed the 1st pass, the 2nd pass basically filters the records using the max value that was obtained in the first pass and with your stated conditions

5. The explicit output statement overrides the implicit output as it is very important to control for the reason the processing of the dataset by groups using Dow loop happens in memory and implicit output by default happens only when sas processes the last observation of the by group. 

 

Hope this helps

 

PS Do get book by Ron cody/ Art Carpenter without fail if you really want to speed up your learning

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1597 views
  • 1 like
  • 2 in conversation