BookmarkSubscribeRSS Feed
Schueppi
Calcite | Level 5

Dear community,

 

I am a SAS beginner and new on SAS communities but I am currently looking for a programming code I could not find a solution, neither on the internet nor here in SAS communities. Probably it is since I cannot phrase it any better, but I hope you can help me out with. Thank you in advance!

 

Let us assume I have a couple of data sets, e.g. one is called DS01, another DS02, etc. The number of data sets varies and should play a role in the code. However the names of the datasets should be independent and can be e.g. AB01, XX05, GHF912 or whatever.

Each data set has a couple of variables, each a different number of variables e.g. DS01 has got variables VAR01, VAR02, VAR03, VAR4 then DS02 has got VAR01, VAR02, VAR05, VAR06 etc. The same applies here: The total number and names of variables differ. It is just for clarification.

 

Now to my point: Some (not all) variables across datasets are numerics and have certain formats. Let us assume the variables have the same format names as their variable names e.g. VAR03 has format VAR03. and VAR05 has got VAR05., VAR06 has got VAR06. However, also the format names can be different than in my examples. The remaining variables have either simple numeric formats e.g. BEST32. or any text formats e.g. $2. or $40. I am only interested in those that have defined formats.

 

My question is: I want to have a macro/macros that look up each dataset for variables that have these defined formats. Each time when there is a variable detected that has such a defined format the variable should be copied in the same dataset but with a slightly different naming e.g. "LBL" at the ending ie. VAR03 is copied to VAR03LBL, VAR05 copied to VAR05LBL etc. The original variable should remain unchanged and the copied should have the labelled content of the formatted variable ie. If VAR03=1 (1=Yes, 0=No) Then VAR03LBL=Yes, If VAR03=0 Then VAR03LBL=No etc.

In the end I want to have each formatted variable as a pair: one contains the value, the other is the label. Best would be when the copied variable is put directly next to the formatted variable ie. VAR03, VAR03LBL, VAR04 etc., but that is not a requirement.

All should be done independently from how many datasets or variables are available, and independently from namings. I want to refer as less variables as necessary, the macro(s) should do the work almost "alone".

 

I found a lot on do loops using %sysfunc e.g. %sysfunc(varfmt(...)), but no paper or solution fit my question. Even if I am just a beginner, I am willing to learn it. Please help me! 

 

Best regards,

Schueppi

6 REPLIES 6
smantha
Lapis Lazuli | Level 10

Let us break this problem into steps:

1. Identify the library and all datasets present in it (sashelp.vtables)

2. Identify all columns and their formats (sashelp.vcolumns)

3. Group Identical column names for which you want to put the label description

4. After all such columns and datasets  have been identified 

for each dataset:

    data dataset;

    set dataset;

    for each column:

    <column_newname> = <column label value>

   end for;

 run;

end for;

5. As to the order of the data sets sas builds the order in which it encounters the variables as it builds the codes

 

for each dataset:

    data dataset;

    format columnlist ordered as you want displayed;

    set dataset;

    for each column:

    <column_newname> = <column label value>

   end for;

 run;

end for;

Hope you are able to build based on this template

for each can be replaced by macro loops for e.g

%let list_ds_to_process = ds1 ds2 ds3 ... dsn; separated by some delimiter

%let n_ds_to_process = %sysfunc(countw(&list_ds_to_process));

%do i = 1 %to &n_ds_to_process;

 %let ds_to_process = %scan(&list_ds_to_process,&i.);

%end;

Tom
Super User Tom
Super User

Before even attempting to make a macro figure out how to do the steps.  

So if you have identified that dataset X has variable A, B, C that you want to generate the decoded values for then you might run a step like this:

data X_decoded ;
  set X;
  length a_lbl $20 b_lbl $10 c_lbl $50 ;
  a_lbl=vvalue(a);
  b_lbl=vvalue(b);
  c_lbl=vvalue(c);
run;

But you might also want to add 

format a b c;

to remove the formats attached to the coded variables. Otherwise when you look at it both variables will show the same value, the decoded value.

 

So the only difficult thing here is deciding what length to use when defining the new variables.  You could simplify things by just setting them all to some fixed length.

 

Getting the list of dataset/variable combinations should be pretty simple (although there is some decisions you need to make. For example do you want to include DATE values since the raw (coded) date values are useless for human review.

One way is to merge with the format names from SASHELP.VFORMAT.

proc contents data=MYLIB._ALL_  noprint
  out=contents(keep=libname memname varnum name format)
;
run;

proc sql ;
  create table custom as 
  select c.* 
  from contents c inner join dictionary.formats f
    on f.source='C' and c.format=f.fmtname and f.fmttype='F'
  ;
quit;

Now you can generate the code from that data.

Tom
Super User Tom
Super User

No need for macro variables.

Here is a more complete solution that includes the code generation step.  If you want to include decoded values for variables with system supplied formats, like DATE or DOLLAR, then update the logic of the SQL join to pick the variables with those formats attached also.

proc contents data=MYLIB._ALL_  noprint
  out=contents(keep=libname memname varnum name format)
;
run;

proc sql ;
  create table custom as 
  select c.* 
  from contents c inner join dictionary.formats f
    on f.source='C' and c.format=f.fmtname and f.fmttype='F'
  order by c.libname,c.memname,c.varnum
  ;
quit;

filename code temp;
data _null_;
  file code ;
  set custom ;
  by libname memname;
  if first.memname then put
     'data work.' memname +(-1) '_cd;'
    /'  set ' libname +(-1) '.' memname ';'
  ;
  put 'length ' name +(-1) '_cd $200.;'
    / name +(-1) '_cd=vvalue(' name ');'
  ;
  if last.memname then put 'run;' ;
run;
%include code / source2;

You can turn it into a macro if you want by replace MYLIB with a macro variable reference.

%macro decode(libref);
proc contents data=&libref.._all_ .....
%mend;

 If you want to remove the format from the original variable you will need to do that in another step, since you need the format attached in the step that is using it to get the formatted value into the new character variable.  Add this step before the %INCLUDE.

data _null_;
  file code mod ;
  set custom end=eof;
  by libname memname ;
   if _n_=1 then put 'proc datasets nolist lib=work;' ;
   if first.memname then put 'modify ' memname +(-1)  '_cd;' / 'format ' @;
   put name @;
   if last.memname then put ';'/'run;';
   if eof then put 'quit;' ;
run;
andreas_lds
Jade | Level 19

An interesting problem. Some things to check, before i can suggest code:

  • You are interested in variables having a format attached with the same name as the variable, right?
  • The name of the dataset does not matter, nor does the library.
  • All variables have names short enough, though that after appending LBL the name is still a valid name.
Schueppi
Calcite | Level 5

Hi andreas,

 

please find my answers below:

  • You are interested in variables having a format attached with the same name as the variable, right?
    Schueppi: No, the formats can have whatever name. They must not be the same as the variable names. Any names.
  • The name of the dataset does not matter, nor does the library. 
    Schueppi: Correct. I do not want to name particular datasets, variables or libraries. This should all be done using (macro) variables.
  • All variables have names short enough, though that after appending LBL the name is still a valid name. 
    Schueppi: Yes, the variable names do not exceed 32-3 characters. The names are short enough to get amended by "LBL" or "CD" or whatever short term.

What I might add is that the added label variable can have any length. A fixed length of $200. should do it best. I mean the added variables are not required to have a dynamical length according to the stored values maximum length.

 

Thank you!

andreas_lds
Jade | Level 19

@Schueppi wrote:

Hi andreas,

 

please find my answers below:

  • You are interested in variables having a format attached with the same name as the variable, right?
    Schueppi: No, the formats can have whatever name. They must not be the same as the variable names. Any names.

Well, how do you want to distinguish between variables that have to be processed and those that can be ignored? You will need a list of the formats or variables that are relevant or define a logic to choose the variables.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 4021 views
  • 0 likes
  • 4 in conversation