BookmarkSubscribeRSS Feed
geneshackman
Pyrite | Level 9

I'm importing two data sets, then combining, but i need to change the length and format of some variables (to combine them with other data sets later). The  problem is i can't seem to get the variables in the right order in the output data set. In particular, in the following, I want to get "year" to be length 20, and format $9., and order in the format statement below. Suggestions?

 

PROC IMPORT DATAFILE= "\\folder\pa1_demo.xls"
out=pa1_demo dbms=xls replace;
sheet='pa1_demo';
RUN;

PROC IMPORT DATAFILE= "\\folder\pa1_zip.xls"
out=pa1_zip dbms=xls replace;
sheet='pa1_zip';
RUN;

data pa1_00;
set pa1_zip pa1_demo;
rename Indicator=Ind_id_0 Demographic=Group Category_Description=Characteristics Rate__Percentage=rate_percent year=year_0 ;
label Indicator=Ind_id_0 Demographic=Group Category_Description=Characteristics Rate__Percentage=rate_percent year=year_0;
run;

* bunch of blank lines. Deleting them;
data pa1_0;
set pa1_00;
if Ind_id_0 = "pa1";
run;


data pa1;
format
Ind_id $8. Title $200. Year $9. Group $30. Characteristics $50. Rate_percent 8.1 Low 8.1 High 8.1 Axis $32. Objective 6.1
rate_percent_ci $char18. rate_percent_tx $char5. rev_color 8. Data_Source $200. Footnotes1 $300. Footnotes2 $210.
Header2 $32. PA_2024_Objective 6.1;
set pa1_0;
if ind_id_0 eq "pa1" then ind_id = "pa1_0";

year = put (year_0,9.);
drop year_0 Rate__Percentage_Text numerator denominator numerator_text denominator_text category ind_id_0;

run;

14 REPLIES 14
PaigeMiller
Diamond | Level 26

I want to get "year" to be length 20, and format $9., and order in the format statement below. Suggestions?

 

What do you get? Are there ERRORs or WARNINGs in the LOG?

 

What happens if you add

 

length year $ 20;

Why do you have year as a character variable?

--
Paige Miller
geneshackman
Pyrite | Level 9
Hi Paige. Year is character because i'm combining this with another data set later, which has year as character.
I tried using length year $20; the problem is where, in the program, do I put that in? If i put it right before the format statement, then "year" gets put before all the other variables, but I want it in the order in the format statement, with year as the third variable.
Thanks
PaigeMiller
Diamond | Level 26

I never really understood why someone cares what the order of the variables in the SAS data set is. Reporting procedures such as PROC PRINT, PROC TABULATE and PROC REPORT allow you to specify the order there, so if you are preparing an output you can then get the order you want. So why do you need variables in a specific order, what is the benefit, what would be the value of you spending time and effort to achieve this order inside a SAS data set?

 

Year is character because i'm combining this with another data set later, which has year as character.


This doesn't answer anything, why is would the other data set have year as character?

 

 

--
Paige Miller
geneshackman
Pyrite | Level 9
I want to specify the order of the variables in this data set because i want them to be in the same order as the data set I will combine it with later. That data set also has year as character. So I want this data set to be the same as the later data set.
PaigeMiller
Diamond | Level 26

@geneshackman wrote:
I want to specify the order of the variables in this data set because i want them to be in the same order as the data set I will combine it with later. That data set also has year as character. So I want this data set to be the same as the later data set.

"Combine"?

 

Does this mean a SET statement, or a PROC APPEND? Neither of these depend on the order of the variables in either data set, you can have randomly ordered variables in the data set, and a SET statement or a PROC APPEND will work properly.

 

May I recommend, if you have any control over the situation, that you not make YEAR a character variable.

--
Paige Miller
Tom
Super User Tom
Super User

@geneshackman wrote:
Hi Paige. Year is character because i'm combining this with another data set later, which has year as character.
I tried using length year $20; the problem is where, in the program, do I put that in? If i put it right before the format statement, then "year" gets put before all the other variables, but I want it in the order in the format statement, with year as the third variable.
Thanks

Use the LENGTH statement to define the variables. Having the first place the data step sees the variable be the FORMAT statement will have a side effect of forcing SAS to guess how to define the variables at that point, and since the only information is has is the width in the format specification it will define the length to by $9. 

 

data pa1;
  length Ind_id $8 Title $200 Year $20 
    Group $30 Characteristics $50 Rate_percent 8 Low 8 High 8
    Axis $32 Objective 8
    rate_percent_ci $18 rate_percent_tx $5 rev_color 8 
    Data_Source $200 Footnotes1 $300 Footnotes2 $210
    Header2 $32 PA_2024_Objective  8
  ;
  set pa1_0;

Once the variables are defined you can add or remove formats with FORMAT statements any where in the step.  Personally I find it dangerous to attach $ formats to character variable, exactly because of what you are planning to do. If you attach the $ format with a different width than the length of the variable you are potentially going to see truncation of the values.

 So just attach the formats that are needed.

  format Year $9. Rate_percent 8.1 Low 8.1 High 8.1 Objective 6.1
    rate_percent_ci $char18. rate_percent_tx $char5. rev_color 8. 
    PA_2024_Objective 6.1
  ;

PS If you really want to stick with using side effects of the FORMAT statement to define your variables there is nothing preventing you from adding a second format statement to change the format later.  So this will define YEAR as length $20 and attach the $9. format to it.

data test;
  format year $20. ;
  format year $9. ;
run;
PaigeMiller
Diamond | Level 26

Certainly @Tom makes correct statements. I have never seen him do otherwise.

 

But I feel his comments re-enforces doing un-necessary steps. Now there might be very rare reasons why the order of variables inside a SAS data set does make a difference, but usually it makes not the tiniest difference.

--
Paige Miller
Tom
Super User Tom
Super User

@PaigeMiller wrote:

Certainly @Tom makes correct statements. I have never seen him do otherwise.

 

But I feel his comments re-enforces doing un-necessary steps. Now there might be very rare reasons why the order of variables inside a SAS data set does make a difference, but usually it makes not the tiniest difference.


Perhaps you never use browse data interactively? I don't know about you but it hate to browse a dataset where someone has placed some rarely used comment variable with a length $200 as the first variable.  

ballardw
Super User

@Tom wrote:

@PaigeMiller wrote:

Certainly @Tom makes correct statements. I have never seen him do otherwise.

 

But I feel his comments re-enforces doing un-necessary steps. Now there might be very rare reasons why the order of variables inside a SAS data set does make a difference, but usually it makes not the tiniest difference.


Perhaps you never use browse data interactively? I don't know about you but it hate to browse a dataset where someone has placed some rarely used comment variable with a length $200 as the first variable.  


Which I tend to avoid by not relying on proc import and set the variable attributes prior to the Input statement reading a text file to set the column order for viewing.

 

One suspects that removing the Proc Import would solve a number of other issues the OP hasn't encountered yet.

PhilC
Rhodochrosite | Level 12

My two cents,  I find my self needing to do the same thing sometimes.  I think its best to set the length and labels and formats in one data step, and then copy the data to the final data set using the pattern I setup below.  The set statements set up the program data vector (PDV).  Note that the if-then statement never runs during execution, but, nevertheless, before execution these statements are used by SAS to define the data-step's PDV. I have no idea what order your files are in but I would write the set statements to ensure the correct ordering.  You could write a set statement for each variable, I try to minimize the number of number of set statements I code in this if-then block.  It is up to you and how you choose to apply Occam's Razor to such things. 

 

data pa1_0;
  set pa1_00;
  if Ind_id_0 = "pa1";

  format
    Ind_id $8. Title $200. Year $9. Group $30. Characteristics $50. Rate_percent 8.1 Low 8.1 High 8.1 Axis $32. Objective 6.1
    rate_percent_ci $char18. rate_percent_tx $char5. rev_color 8. Data_Source $200. Footnotes1 $300. Footnotes2 $210.
    Header2 $32. PA_2024_Objective 6.1;
  Label  
      Ind_id = "Ind_id something.."
      Title  = "something else"
          /*..and so on..*/
     ;
run;

data pa1;
  if 0 then do;* setting the order of the SAS Program Data Vector (PDV);
    set pa1_0 (Keep= Ind_id Title Year);
    set pa1_0 (Keep= Group Characteristics Rate_percent);
    set pa1_0 (Keep= Low High  Axis Objective);
    set pa1_0 (Keep= rate_percent_ci rate_percent_tx rev_color  Data_Source);
    set pa1_0 (Keep= Footnotes1 Footnotes2 Header2);
    set pa1_0 (Keep= PA_2024_Objective);
  end;  set pa1_0;
  if ind_id_0 eq "pa1" then ind_id = "pa1_0";

  year = put (year_0,9.);
  drop year_0 Rate__Percentage_Text numerator denominator numerator_text denominator_text category ind_id_0;
run;

I did not fix your "year" variables, so this is just a demonstration on how I order variables in the program data vector. 

geneshackman
Pyrite | Level 9
Phil,
This seems interesting. What is the purpose of having multiple set statements?
PhilC
Rhodochrosite | Level 12

say if the PDV on pa1_0 was organized like this:

 

Year Group Characteristics Rate_percent  Ind_id Title

 

And that I wanted pa1's PDV to be in this order:

 

Rate_percent  Title Year Group Characteristics Ind_id

 

Then I would write my final step like this:

 

data pa1;
  if 0 then do;* setting the order of the SAS Program Data Vector (PDV);
    set pa1_0 (Keep= Rate_percent Title );
    set pa1_0 (Keep= Ind_id Year Group Characteristics );
  end;  
  set pa1_0;
run;

The order inside the Keep= is not important.  The two variables in the first set statement get added to the new PDV first, in the order they appear in pa1_0 .  The next four variables get added to the new PDV -- just the same --  in the order they appear in pa1_0 .  but, in doing this, we made Rate_percent and Title go first in the dataset pa1.

 

Also note: all formats labels and length metadata and properties for the variables are transferred to the new PDV, i.e. the new dataset,  from the old dataset.

 

 

geneshackman
Pyrite | Level 9
Two reasons to have variables in the same specific order in all the data sets:
1. it helps me keep track of them all, to make sure I didn't forget any.
2. If I want to visually compare data sets, I can easily find the variables because i know in which columns to look.
Kurt_Bremser
Super User

Your pain starts with the use of Excel files, which forces you to use PROC IMPORT, which then forces you to clean up the mess caused by the guessing of said procedure.

Save the two spreadsheets to csv files, and read those in a single data step, where all attributes are defined correctly once and for all.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3589 views
  • 3 likes
  • 6 in conversation