BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Beartato
Fluorite | Level 6

Howdy folks,

 

I am running both Pearson and Spearman correlations for a large dataset (approx. 7,000 sets of data), and am wondering whether there is a way to program the PROC CORR program to only run the analysis up to the first instance of "0" for each participant. All sets of data begin at a non-zero number and theoretically drop to zero. However, the point at which these sets of data reach zero differ between participants, so I can't simply delete or replace all columns following the first-occurring zero. For example, see the following three example sets of data (note that each cell indicates number of item purchased at that price):

 

  Price         
  $1$2$3$4$5$6$7$8$9$10
Participant110864200000
 225252520181510000
 32000000000


 
Notice that as price goes up, the instances of the observed purchase decrease. What I need for PROC CORR to execute is to read each set of observations, and only analyze observations through the first zero (for example, I highlighted these zeroes in the above set), but not consider any other zeroes after the first zero. This task needs to be executed for the basic Pearson PROC CORR and the SPEARMAN-enabled PROC CORR statement.  

Theoretically, I could simply work through the code and delete (or replace) all zeroes after the first-occurring zero, but it would be difficult to do so for 7,000 sets of observations, and I believe that SAS is capable of executing this task. 

 

I had previously used an ARRAY function to replace all instances of zero with "." to mark them as missing. However, the analysis requires that the first-occuring zero be considered as a component of the function, and the ARRAY function I was using would delete the first-occuring zero.

 

Any advice? Many thanks for your time!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Which two variables are you going to use to calculate Correlation coefficience ? two participant ? or two Price ?

And why not just set the zeros which follows the first zero to be missing  value ?

 

data have;
infile cards expandtabs truncover;
input Participant	_1-_10;
cards;
1	10	8	6	4	2	0	0	0	0	0
2	25	25	25	20	18	15	10	0	0	0
3	2	0	0	0	0	0	0	0	0	0
;
run;
data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

x.png

 

View solution in original post

9 REPLIES 9
PGStats
Opal | Level 21

What are you correlating. Price with number of purchases or number of purchases between participants?

PG
stat_sas
Ammonite | Level 13

One way to do this is to stack all variables' names and values then use by processing in proc corr for values greater than 0. Something like this:

 

data have;
input Price Participant1 Participant2 Participant3;
datalines;
1 10 25 0
2 8 25 0
3 6 25 0
4 4 20 0
5 2 18 0
6 0 15 0
7 0 10 0
8 0 0 0
9 0 0 0
10 0 0 0
;

 

data want(keep=variable price value);
set have;
array p(*) Participant:;
do i=1 to dim(p);
value=p(i);
variable=vname(p(i));
output;
end;
run;

 

proc sort data=want;
by variable;
run;

 

proc corr data=want(where=(value>0));
by variable;
var price;
with value;
run;

Beartato
Fluorite | Level 6
Thanks for the reply. I see that I should have been clearer in the original post - I need to include the first instance of zero (for each participant) in the proc corr. Would this syntax maintain the first instance of zero?
stat_sas
Ammonite | Level 13

Please try the following syntax that will maintain the first instance of zero for proc corr:

 

data want(keep=variable price value);
set have;
array p(*) Participant:;
do i=1 to dim(p);
value=p(i);
variable=vname(p(i));
output;
end;
run;

 

proc sort data=want;
by variable;
run;

 

data corr(drop=flag);
do until(last.variable);
set want;
by variable;
if not flag then output;
if value=0 then flag = 1;
end;
run;

 

proc corr data=corr;
by variable;
var price;
with value;
run;

Beartato
Fluorite | Level 6
As an addendum, is there any SAS code that could modify the dataset with which I'm working so as to simply *delete* or mark as missing all values after the first 0, rather than simply teaching the PROC CORR to only read up to the first zero? Having the data sets pruned to this point would help quite a bit with future analyses.
PGStats
Opal | Level 21

Make your data long instead of wide:

 

data long;
set wide;
array A n1-n10;
do price = 1 to dim(A) until(A{price}=0);
	number = A{price};
	output;
	end;
keep participant price number;
run;
PG
Ksharp
Super User

Which two variables are you going to use to calculate Correlation coefficience ? two participant ? or two Price ?

And why not just set the zeros which follows the first zero to be missing  value ?

 

data have;
infile cards expandtabs truncover;
input Participant	_1-_10;
cards;
1	10	8	6	4	2	0	0	0	0	0
2	25	25	25	20	18	15	10	0	0	0
3	2	0	0	0	0	0	0	0	0	0
;
run;
data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

x.png

 

Beartato
Fluorite | Level 6

@Ksharp wrote:

Which two variables are you going to use to calculate Correlation coefficience ? two participant ? or two Price ?

And why not just set the zeros which follows the first zero to be missing  value ?

 

data have;
infile cards expandtabs truncover;
input Participant	_1-_10;
cards;
1	10	8	6	4	2	0	0	0	0	0
2	25	25	25	20	18	15	10	0	0	0
3	2	0	0	0	0	0	0	0	0	0
;
run;
data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

x.png

 


This looks like exactly what I'm looking for! Thanks for sending this. Will this syntax work with data libraries that have already been imported to SAS? I.E., can I just replace the highlighted text below with my libname.refname ?

 

data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

 

Ksharp
Super User

Yes. You can use that as long as it turn into a SAS dataset.

 

data yourlib.want;
 set yourlib.have;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1228 views
  • 10 likes
  • 4 in conversation