BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Rou
Obsidian | Level 7 Rou
Obsidian | Level 7

Hi all,
I have two dataset that I merged and proceed to array and do loop to create new variables but each step is too long to run and I am running out of space on my computer. I posted a sample code and would appreciate any input.

Thanks!

 

proc sort data= BL.data;
	by pid;
run;

proc sort data= BL.fudata;
	by pid;
run;
/*Merging both data*/
data merged_data;
	merge BL.data BL.fudata;
	by pid;
run;
/*creating new variables*/
data mergedclean1;
 set mergeddata_fmt;
 array ablpromis {4} promis_pa_scale1  promis_pa_scale3  promis_pa_scale4  promis_pa_scale5;
 array afupromis {4} promis_pa_scale1_fu promis_pa_scale3_fu promis_pa_scale4_fu promis_pa_scale5_fu;
 do i=1 to 4;
		blpromis= ablpromis {i};
		fupromis= afupromis {i};
		if blpromis= 99 then blpromis= .;
		if blpromis= '' then blpromis= .;
		if fupromis= 99 then fupromis= .;
		if fupromis= '' then fupromis= .;
        output;
 end;
 drop i  promis_pa_scale1  promis_pa_scale3  promis_pa_scale4  promis_pa_scale5 promis_pa_scale1_fu promis_pa_scale3_fu promis_pa_scale4_fu promis_pa_scale5_fu;
run;

proc print data= mergedclean1;
run;

data mergedclean2;
	set mergedclean1;
	array ablefficacy {8} c_paselfefficacy1 c_paselfefficacy2 c_paselfefficacy3 c_paselfefficacy4 c_paselfefficacy5 c_paselfefficacy6 c_paselfefficacy7 c_paselfefficacy8;
	array afuefficacy {8} c_paselfefficacy1_fu c_paselfefficacy2_fu  c_paselfefficacy3_fu  c_paselfefficacy4_fu  c_paselfefficacy5_fu  c_paselfefficacy6_fu  c_paselfefficacy7_fu  c_paselfefficacy8_fu;
	do i=1 to 8;
			blefficacy= ablefficacy {i};
			fueffecicay= afuefficacy {i};
			if blefficacy= 99 then blefficacy= .;
			if blefficacy= '' then blefficacy= .;
			if fuefficacy= 99 then fuefficacy= .;
			if fuefficacy= '' then fuefficacy= .;
       		output;
    end;
	drop i c_paselfefficacy1 c_paselfefficacy2 c_paselfefficacy3 c_paselfefficacy4 c_paselfefficacy5 c_paselfefficacy6 c_paselfefficacy7 c_paselfefficacy8 c_paselfefficacy1_fu c_paselfefficacy2_fu  c_paselfefficacy3_fu  c_paselfefficacy4_fu  c_paselfefficacy5_fu  c_paselfefficacy6_fu  c_paselfefficacy7_fu  c_paselfefficacy8_fu;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

A comment about this type of coding that may be impacting your performance slightly:

		if blpromis= 99 then blpromis= .;
		if blpromis= '' then blpromis= .;
		if fupromis= 99 then fupromis= .;
		if fupromis= '' then fupromis= .;

The variables blpromis and fupromis are either character or numeric. If the value is character then SAS has to do a conversion here to make 99 character for comparison with blpromis or fupromis. Minor hit but you are doing this repeatedly. Plus the result will not be a numeric missing, i.e. the dot but an actual character.

351  data example;
352     /* character valued variable*/
353     x='99';
354     if x=99 then x=.;
355  run;

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).
      354:7
NOTE: Numeric values have been converted to character values at the places given by:
      (Line):(Column).
      354:19

When you see those notes about "converted to" then SAS is wasting cycles. Also the conversions can use values you do not expect because of the default rules for the conversions.

 

In any case you are likely doing extra comparisons if the first one is true. If the value were 99 then there is no reason to test for missing because you just set it to missing. In a generic sense to reduce wasted time you would code something more like

		if blpromis= 99 then blpromis= .;
		ELSE if blpromis= <a different value goes here> then blpromis= .;

In this case the second comparison is not executed when the first one is true.

 

If the value is actually numeric then this forces a different numeric/character conversion: if blpromis= '' AND wasting a comparison as the value of a missing numeric is already . So the comparison would not even be needed.

Make sure you know your variable type. Also instead of "if blpromis= '' " you can use " If missing(blpromis) " which will do the correct comparison for missing regardless of character type.

 

So in the first merge you are likely doing 4(array size)*2 (number of arrays) =8 for each record doing unneeded comparisons for missing, and similarly 16 in the second.

 

Your MergedClean1 data set will make 4 copies of every single variable except for the variables in the arrays and i. So the data set goes get larger.

And then the MergedClean2 makes 8 copies of every single variable except the ones in the arrays and i. So if you have other variables you have made 32 copies of them.

 

You don't show how Mergeddata_fmt is made.

View solution in original post

4 REPLIES 4
Tom
Super User Tom
Super User

What do the NOTEs in the SAS log tell you?  How long is each step taking?  

How many observations are in each dataset? How many variables.

 

The flow of you program does not seem to make much sense. 

You sort and merge two datasets to create a third.  

Then your next step is referencing some other previously never mentioned fourth dataset to create a fifth.  Why?  If you are not going to use the results of the merge why bother doing it?

 

Why do you have PROC PRINT step?  How large is this data?  Are you really going to look at potentially hundreds of pages of printout?

 

Rou
Obsidian | Level 7 Rou
Obsidian | Level 7
Hi Tom,
Thanks for the reply and sorry for the confusion as I am still trying to my understand sas. The two dataset that I merged are pre and post data. My primary and secondary outcomes have multiples variables hence why I was using array to combine them into one single variable before doing any kind of analysis.
ballardw
Super User

A comment about this type of coding that may be impacting your performance slightly:

		if blpromis= 99 then blpromis= .;
		if blpromis= '' then blpromis= .;
		if fupromis= 99 then fupromis= .;
		if fupromis= '' then fupromis= .;

The variables blpromis and fupromis are either character or numeric. If the value is character then SAS has to do a conversion here to make 99 character for comparison with blpromis or fupromis. Minor hit but you are doing this repeatedly. Plus the result will not be a numeric missing, i.e. the dot but an actual character.

351  data example;
352     /* character valued variable*/
353     x='99';
354     if x=99 then x=.;
355  run;

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).
      354:7
NOTE: Numeric values have been converted to character values at the places given by:
      (Line):(Column).
      354:19

When you see those notes about "converted to" then SAS is wasting cycles. Also the conversions can use values you do not expect because of the default rules for the conversions.

 

In any case you are likely doing extra comparisons if the first one is true. If the value were 99 then there is no reason to test for missing because you just set it to missing. In a generic sense to reduce wasted time you would code something more like

		if blpromis= 99 then blpromis= .;
		ELSE if blpromis= <a different value goes here> then blpromis= .;

In this case the second comparison is not executed when the first one is true.

 

If the value is actually numeric then this forces a different numeric/character conversion: if blpromis= '' AND wasting a comparison as the value of a missing numeric is already . So the comparison would not even be needed.

Make sure you know your variable type. Also instead of "if blpromis= '' " you can use " If missing(blpromis) " which will do the correct comparison for missing regardless of character type.

 

So in the first merge you are likely doing 4(array size)*2 (number of arrays) =8 for each record doing unneeded comparisons for missing, and similarly 16 in the second.

 

Your MergedClean1 data set will make 4 copies of every single variable except for the variables in the arrays and i. So the data set goes get larger.

And then the MergedClean2 makes 8 copies of every single variable except the ones in the arrays and i. So if you have other variables you have made 32 copies of them.

 

You don't show how Mergeddata_fmt is made.

Rou
Obsidian | Level 7 Rou
Obsidian | Level 7
Thanks for the thorough explanation, makes sense!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 951 views
  • 3 likes
  • 3 in conversation