BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Michaelcwang2
Obsidian | Level 7

Dear all,

I'm writing a code by sas university edition (9.4) to come out a dataset with column i, j, X, Y. The code is as below:

data work.Rand;

%macro Normal_Simulation;

call streaminit(567);
do i=1 to 50;
	X=Rand("normal",0.02*j-1,1.0);
	IF X>3 OR X<-3 THEN DO
	Y=X;
	X=0;
	i=i-1;
	END;
	
	ELSE Y=0;
	/*u=Rand("uniform");*/
output;
end;

%mend;

do j=0 to 99;
%Normal_Simulation;
end;
run;

PROC SQL;
   CREATE TABLE work.query AS
   SELECT j , i , X , Y FROM work.rand; 
   /*WHERE J=&j AND Y=0;*/
   RUN;
   QUIT;

Can I use any sql expression in Where statement here to come out like only the first 10 percentile dataset of X of each j in this table? Thank you.

 

MKW

 

1 ACCEPTED SOLUTION

Accepted Solutions
Michaelcwang2
Obsidian | Level 7

Dear Tom,

For certain reason, it still doesn't work in this way.

proc sql ;
 173        create table want as
 174        select X.*
 175        from rand1 X , percentile1 P_n
 176        where X.j= percentile1.j
 177          and X.x > P_n.p95

But I did solve my problem by using PROC MEAN to create another data set by P90 of each j, then merge it with original dataset "Rand" and Merge two by j and Delete observation if X<P90 column. Anyway, thank you for all your support!

 

MKW 

View solution in original post

8 REPLIES 8
Reeza
Super User

1. Never define a macro inside a data step

2. Show what you have (Data) and what you want

3. Explain the actual problem you’re trying to solve

4. Don’t be loopy. 

 

Before making a macro, you should also start with working code. Can you show what your solution looks like before it’s a macro?

Michaelcwang2
Obsidian | Level 7

Hi Reeza,

By running the macro, I'll have 200 normal distributions with moving mean populated by 50 (or a few more) variable X with index Y as 0 or 1 if they are beyond a limit. I can collect these X (50*200) then analyze its descriptive statistics with PROC MEAN by each j.

Now I have special interest over the largest few data like n percentile of each distribution, so I would to slice them from this data set out and do the same analysis with similar PROC MEAN.

proc means 
   data=work.query 
     chartype NWAY 
     mean std min max n vardef=df skew SKEWNESS KURT KURTOSIS median;
   var X;
   output 
   out=work.skewtemp 
     skew=Distskew KURT=DISKURT max=DISmax median=DISmedian min=DISmin; 
     where (j between 0 and 199) and Y=0;       
     class J;	
   run;

Hopefully it helps to clarify my problem. Thank you!

 

MKW

Kurt_Bremser
Super User

After removing the unnecessary macro and correcting errors (like the missng semicolon after then do), this is your code.

data work.Rand;
do j = 0 to 99;
  call streaminit(567);
  do i = 1 to 50;
    X = Rand("normal",0.02*j-1,1.0);
    if X > 3 or X < -3
    then do;
      Y = X;
      X = 0;
      i = i - 1;
    end;
    else Y = 0;
    /*u=Rand("uniform");*/
    output;
  end;
end;
run;

proc sql;
create table work.query as
select j, i, X, Y
from work.rand 
/*where J = &j and Y = 0*/
;
quit;

From where would you get &j?

Michaelcwang2
Obsidian | Level 7
If I use PROC Univariate to come out a data file of n-percentile of X by J, is there a way to sql to get X>these values for each j from original dataset ? Thank you.
Tom
Super User Tom
Super User

If you have one dataset, HAVE, with J and many X values and another dataset, MEANS, with J and a cutoff value, say P95, then just join them.

proc sql ;
create table want as
select a.*
from have a , means b
where a.j= b.j 
  and a.x > b.p95
;
quit;

 

Michaelcwang2
Obsidian | Level 7
Thank you, Tom. Will check if it solves .
Michaelcwang2
Obsidian | Level 7

Dear Tom,

For certain reason, it still doesn't work in this way.

proc sql ;
 173        create table want as
 174        select X.*
 175        from rand1 X , percentile1 P_n
 176        where X.j= percentile1.j
 177          and X.x > P_n.p95

But I did solve my problem by using PROC MEAN to create another data set by P90 of each j, then merge it with original dataset "Rand" and Merge two by j and Delete observation if X<P90 column. Anyway, thank you for all your support!

 

MKW 

Reeza
Super User

If your goal is to figure out what's higher than the 95th percentile, I would use the RANK proc instead and then filter that out directly.

 

 

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2105 views
  • 0 likes
  • 4 in conversation