Re: how to group the obseravations depend on one value

Ravikumarkummari · Posted 02-12-2015 05:12 AM

Hi all,

data is available in pdf file. how to group the variables depend on one value. Thanks in advance all.

Ravikumarkummari · Posted 02-12-2015 06:42 AM

pdf file all the information is available and desired result also available

so if you know the what is sufficient process for this result please post your valid comments.

Thanks in advance.

KachiM · Posted 02-12-2015 08:51 AM

Your constraints seems to be not enough. I am getting a solution that is fluke.

data want;

retain Tot 0 numemp 0 ;

set have end = eof;

flag = 0;

Tot + salaries;

numemp + 1;

if Tot > 20000 then do;

if Tot > 24000 and ^eof then do;

temp = tot - salaries;

Tot = temp;

flag = 1;

put Tot = salaries = temp =;

end;

output;

Tot = ifN(flag, Salaries, 0);

numemp = 0;

end;

keep Tot numemp;

run;

I missed to revise the number of employees when adjustment is made.

The revised program takes care of this.

data want;

retain Tot 0 numemp 0 ;

set have end = eof;

flag = 0;

Tot + salaries;

numemp + 1;

if Tot > 20000 then do;

if Tot > 24000 and ^eof then do;

temp = tot - salaries;

Tot = temp;

numemp +- 1;

flag = 1;

put Tot = salaries = temp =;

end;

output;

Tot = ifN(flag, Salaries, 0);

numemp = ifN(flag, 1, 0);

end;

keep Tot numemp;

run;

Ravikumarkummari · Posted 02-12-2015 10:49 PM

It was some different from my output. first sort the data in ascending order in this case.

i need only 10 groups. i am not understanding how to divide the groups. in this case only 10 groups. But some cases its only five depending on value in this case 50000. So if you know this process please post. how to divide the groups based on one value.

Ravikumarkummari · Posted 02-13-2015 12:17 AM

sir your code is working but last two records are missing in this case. its skipping last 2 records

KachiM · Posted 02-13-2015 03:17 AM

Your posts have been mainly telling to find 10 Groups by a kind of plus or minus approach.

Here is an approach which may meet your needs.

The Algorithm used:

[1] Use the Proc univariate and get the Percentiles for Deciles(10 Groups.

[2] Load the Deciles in an Array, and determine the i-th group where Salaries falls.

[3] Have an another Array to accumulate the number of records that falls in the group.

[4] At the end reading the data set, Use the arrays to output Group, End_Salary, and Count.

The following PROC finds the ten Percentils(P_10, P_20, ....P_100).

proc univariate data = have noprint;

var salaries;

output pctlpre = P_ pctlpts = 10 to 100 by 10;

run;

The Percentiles are:

P_10 P_20 P_30 P_40 P_50 P_60 P_70 P_80 P_90 P_100

1800 2950 3600 4850 6000 6500 7300 8050 9050 9600

This helps to determine the Group.

For instance, Salaries <= 1800 belongs to First Group. Salaries <= 6000 belongs to 5-th Group.

The program below with explanatory comments will get the output.

data want;

if _n_ = 1 then do;

set data1;

end;

/* Load P_10 to P_100 in array k[ ] */

array k

P_:;

/* Have array m[ ] to hold the counts of obsevations falling in each Group */

array m[10] _temporary_;

do until(eof);

set have end = eof;

/* Check the correct i and increment the counter(in m). Then leave the i-loop

to process the next record */

do i = 1 to 9;

if salaries <= k then do;

m + 1;

leave;

end;

/* If Salaries > P_90 then increment the counter for 10-th Group */

else if salaries > k[9] then do;

m[10] + 1;

leave;

end;

/* Use the Array m[ ] for COUNT and k[ ] for END_SALARY */

do group = 1 to 10;

end_salary = k[group];

count = m[group];

output;

end;

stop;

keep group end_salary count;

run;

The output is:

end_

group salary count

1 1800 4

2 2950 4

3 3600 4

4 4850 4

5 6000 5

6 6500 4

7 7300 3

8 8050 4

9 9050 4

10 9600 4

how to group the obseravations depend on one value