DATA Step, Macro, Functions and more

want to understand some code writing

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 79
Accepted Solution

want to understand some code writing

I saw this following code today. But I could not figure out the purpose of some code. 

 

One is why use 'visitnum=.;' here.  There is no difference with this sample code. But with the original data, there is a difference between code with 'visitnum=.;' and code without 'visitnum=.;' Does anyone meeting this kind thing before?

 

Another is why use  '(rename=(city=city2))' and 'city=city2;'   I think it is not necessary. 

 

 

data children;
input id city$ age day$ symptom @@;
output;
datalines;
1 steelcity 8 day0 1
2 steelcity 8 day2 1
3 steelcity 8 day2 1
4 greenhills 8 day0 0
5 steelcity 8 day0 0
6 greenhills 8 . 1 
7 steelcity 8 . 1
8 greenhills 8 day1 0
9 greenhills 8 day2 1
;
run;
data try; set children (rename=(city=city2)); length city$50.; city=city2; visitnum=.; if day='day0' then visitnum=0; if day='day1' then visitnum=1; if day='day2' then visitnum=2; keep id city i age day symptom visitnum; run;

Accepted Solutions
Solution
2 weeks ago
Valued Guide
Posts: 631

Re: want to understand some code writing

[ Edited ]
Posted in reply to xiangpang

The reason for using rename and length to change the length of "city" is maintaining the order of variables. Order matters if, e.g., later in the program proc export is used and the order of variables in the exported file must not change.

 

The variable "i" should be removed from the keep statement, because it is not used in the datastep at all.

 

"visitnum=.;" would server a purpose if the author tried to extract the number from day, but wanted visitnum as numeric without using the proper converting function. Just guessing ;-)

 

If there are more than three values possible for visitnum, replacing the if-then-logic with a single assignment will help keeping the code short:

visitnum = input(substr(day, 4), best.);

Assuming that the word "day" is always in the variable "day".

 

 

View solution in original post


All Replies
Super User
Posts: 6,935

Re: want to understand some code writing

Posted in reply to xiangpang

You're right about:

 

visitnum=.;

 

It serves no purpose here.  It might serve a purpose if VISITNUM were part of the incoming CHILDREN data set.  But we can see from the program that VISITNUM gets created within TRY.

 

The renaming and copying of CITY is designed to give CITY a longer length.  In CHILDREN, CITY has a length of 8.  It would be simpler to forget about the renaming and copying, and just add this statement instead:

 

length city $ 50;

 

That statement could go either before the INPUT statement that creates CHILDREN, or before the SET statement that creates TRY.  Most likely, the programmer tried to include it where it now appears, after the SET statement.  That would be an illegal statement, trying to re-set the length of a character variable after the SET statement has already assigned a length.  So the combination that you see was all that the programmer could figure out as a way to change the length of CITY.  But a properly placed LENGTH statement could have done the trick.

Frequent Contributor
Posts: 79

Re: want to understand some code writing

Posted in reply to Astounding

Thanks. The answer about length changing makes sense. I did not realize it before. 

 

Although visitnum not shown in CHILDREN dataset, but 'visitnum=.' must be meaningful here, since I could see a difference with original code (also not include visitnum). 

 

Again, thanks for answering the question.

Super User
Posts: 6,935

Re: want to understand some code writing

Posted in reply to xiangpang

No, visitnum=. does nothing here.  If the program is actually more complex, and contains additional statements that you have removed, then it may serve a purpose.

Solution
2 weeks ago
Valued Guide
Posts: 631

Re: want to understand some code writing

[ Edited ]
Posted in reply to xiangpang

The reason for using rename and length to change the length of "city" is maintaining the order of variables. Order matters if, e.g., later in the program proc export is used and the order of variables in the exported file must not change.

 

The variable "i" should be removed from the keep statement, because it is not used in the datastep at all.

 

"visitnum=.;" would server a purpose if the author tried to extract the number from day, but wanted visitnum as numeric without using the proper converting function. Just guessing ;-)

 

If there are more than three values possible for visitnum, replacing the if-then-logic with a single assignment will help keeping the code short:

visitnum = input(substr(day, 4), best.);

Assuming that the word "day" is always in the variable "day".

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 89 views
  • 2 likes
  • 3 in conversation