12-09-2017 08:14 AM

Dear all,

I'm writing a code by sas university edition (9.4) to come out a dataset with column i, j, X, Y. The code is as below:

```
data work.Rand;
%macro Normal_Simulation;
call streaminit(567);
do i=1 to 50;
X=Rand("normal",0.02*j-1,1.0);
IF X>3 OR X<-3 THEN DO
Y=X;
X=0;
i=i-1;
END;
ELSE Y=0;
/*u=Rand("uniform");*/
output;
end;
%mend;
do j=0 to 99;
%Normal_Simulation;
end;
run;
PROC SQL;
CREATE TABLE work.query AS
SELECT j , i , X , Y FROM work.rand;
/*WHERE J=&j AND Y=0;*/
RUN;
QUIT;
```

Can I use any sql expression in Where statement here to come out like only the first 10 percentile dataset of X of each j in this table? Thank you.

MKW

Accepted Solutions

Solution

12-10-2017
07:59 PM

Posted in reply to Michaelcwang2

12-10-2017 07:59 PM

Dear Tom,

For certain reason, it still doesn't work in this way.

```
proc sql ;
173 create table want as
174 select X.*
175 from rand1 X , percentile1 P_n
176 where X.j= percentile1.j
177 and X.x > P_n.p95
```

But I did solve my problem by using PROC MEAN to create another data set by P90 of each j, then merge it with original dataset "Rand" and Merge two by j and Delete observation if X<P90 column. Anyway, thank you for all your support!

MKW

All Replies

Posted in reply to Michaelcwang2

12-09-2017 11:50 AM

1. Never define a macro inside a data step

2. Show what you have (Data) and what you want

3. Explain the actual problem you’re trying to solve

4. Don’t be loopy.

Before making a macro, you should also start with working code. Can you show what your solution looks like before it’s a macro?

Posted in reply to Reeza

12-09-2017 04:02 PM

Hi Reeza,

By running the macro, I'll have 200 normal distributions with moving mean populated by 50 (or a few more) variable X with index Y as 0 or 1 if they are beyond a limit. I can collect these X (50*200) then analyze its descriptive statistics with PROC MEAN by each j.

Now I have special interest over the largest few data like n percentile of each distribution, so I would to slice them from this data set out and do the same analysis with similar PROC MEAN.

```
proc means
data=work.query
chartype NWAY
mean std min max n vardef=df skew SKEWNESS KURT KURTOSIS median;
var X;
output
out=work.skewtemp
skew=Distskew KURT=DISKURT max=DISmax median=DISmedian min=DISmin;
where (j between 0 and 199) and Y=0;
class J;
run;
```

Hopefully it helps to clarify my problem. Thank you!

MKW

Posted in reply to Michaelcwang2

12-09-2017 12:57 PM

After removing the unnecessary macro and correcting errors (like the missng semicolon after then do), this is your code.

```
data work.Rand;
do j = 0 to 99;
call streaminit(567);
do i = 1 to 50;
X = Rand("normal",0.02*j-1,1.0);
if X > 3 or X < -3
then do;
Y = X;
X = 0;
i = i - 1;
end;
else Y = 0;
/*u=Rand("uniform");*/
output;
end;
end;
run;
proc sql;
create table work.query as
select j, i, X, Y
from work.rand
/*where J = &j and Y = 0*/
;
quit;
```

From where would you get &j?

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Posted in reply to Michaelcwang2

12-09-2017 08:06 PM

If I use PROC Univariate to come out a data file of n-percentile of X by J, is there a way to sql to get X>these values for each j from original dataset ? Thank you.

Posted in reply to Michaelcwang2

12-09-2017 09:27 PM

If you have one dataset, HAVE, with J and many X values and another dataset, MEANS, with J and a cutoff value, say P95, then just join them.

```
proc sql ;
create table want as
select a.*
from have a , means b
where a.j= b.j
and a.x > b.p95
;
quit;
```

Posted in reply to Michaelcwang2

12-09-2017 09:55 PM

Thank you, Tom. Will check if it solves .

Solution

12-10-2017
07:59 PM

Posted in reply to Michaelcwang2

12-10-2017 07:59 PM

Dear Tom,

For certain reason, it still doesn't work in this way.

```
proc sql ;
173 create table want as
174 select X.*
175 from rand1 X , percentile1 P_n
176 where X.j= percentile1.j
177 and X.x > P_n.p95
```

But I did solve my problem by using PROC MEAN to create another data set by P90 of each j, then merge it with original dataset "Rand" and Merge two by j and Delete observation if X<P90 column. Anyway, thank you for all your support!

MKW

Posted in reply to Michaelcwang2

12-10-2017 11:13 PM

If your goal is to figure out what's higher than the 95th percentile, I would use the RANK proc instead and then filter that out directly.