BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_Kh
Barite | Level 11

Dear community, 

 

I have a dataset (imported from xlsx each time when run the program) that has predefined LENGTH for variables to fit all their values. 

eg: 
SUBJID           RELREC

A100101         Yes

A100102         No

A100103         Unable to adjudicate due to insufficient information

 

When imported into SAS SUBJID has a length of $27 while RELREC has $57. 
Per my client's data standards (specifications), export dataset must have following LENGTH and FORMAT for above variables: 

VARIABLE                  LENGTH             FORMAT              Expected VALUE

RELREC                     $2                       $NYU.                   "0" ("No")

                                                                                             "1" ("Yes")

                                                                                             "2" ("Unable to Adjudicate due to....")

 

My challenge is when applying format and length in DATA step RELREC value is getting truncated and not properly formatted due to data truncation. eg:

RELREC (formatted)

0

Ye

Un

 

To my understanding LENGTH cannot be shorter than the variable's actual value. I tried to assign permanent format at the first data step, then assign length (to a new variable for renaming it later as RELREC) in a different data step. But still getting the same outcome. 

 

I hope someone from this community had encountered the same issue/or has clear understanding on the behavior of LENGTH vs FORMAT statements and could give some guidance in resolving this issue. 

 

Many thanks in advance!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

I see this gives required attributes as format and length, not "the formatted value" which is "0", "1" and "2". 

You want a length of $2. Therefore the unformatted value can only be 0, 1, 2 and the formatted value is the text.

View solution in original post

8 REPLIES 8
Tom
Super User Tom
Super User

Sounds like you need to convert the text in the XLSX file into the values that the format expects.

 

Note that you cannot change the LENGTH of a character variable once it is defined.  You probably need to create a NEW variable that will have the shorter length.  You can always rename it to have the desired name.

data want;
  set have;
  length new_relrec $2;
  if relrec='No' then new_relrec='0';
  else if relrec='Yes' then new_relrec='1';
  else if relrec=:'Unable to' then new_relrec='2';
  drop relrec;
  rename new_relrec = relrec;
  format new_relres $nyu.;
run;

 

In SAS formats are used to convert values to text and informats are used to convert text to values.  Since your values are text you could create either an INFORMAT or a format.  So you might create an informat named $NYUI or a format named $NYUR and then replace the IF/THEN/ELSE block with simple assignment statement.  Like:

new_relec=input(relrec,$nyuI.);

or

new_relrec=put(relrec,$nyuR.);

 

Note that in this case since the first two characters of the decoded values you are starting with is enough to uniquely identify the codes you want to map to could skip the renaming part.

data want;
  length relrec $2;
  set have;
  if relrec='No' then relrec='0';
  else if relrec='Ye' then relrec='1';
  else if relrec='Un' then relrec='2';
  format relres $nyu.;
run;  

 

 

A_Kh
Barite | Level 11

Hi @Tom , 

 

Thank you for your input!

What I did not understand is I've created format for "YES", "NO" and "UNABLE TO.." values.

like:

proc format;
             value $nyu "NO"= "0"
                                "YES"= "1"
                                 "UNABLE TO ADJUDICATE DUE TO INSUFFICIENT INFORMATION"= "2";

run;

 

In your example below the format is applied to "0", "1" and "2" values of new_relrec variable (not to 'yes', 'no' 'unable to..')..... It does give me desired METADATA (when checked in proc contents), however I did not understand the logic here.  

 

data want;
  set have;
  length new_relrec $2;
  if relrec='No' then new_relrec='0';
  else if relrec='Yes' then new_relrec='1';
  else if relrec=:'Unable to' then new_relrec='2';
  drop relrec;
  rename new_relrec = relrec;
  format new_relres $nyu.;
run;

 

Kurt_Bremser
Super User

In a VALUE Statement in the FORMAT Procedure, the source value has to be on the left of the equal sign, and the target value to the right.

Your format, as defined, could be used for the initial conversion, if the source text was in all uppercase.

ChrisNZ
Tourmaline | Level 20

You probably want to run something like this after your proc import:

proc format ;
 value $convert  "No"   ="0"
                 "Yes"  ="1"
                 "U"-"V"="2";

 value $nyu (default=57) "0"= "No"
                         "1"= "Yes"
                         "2"= "Unable to Adjudicate due to...." ;
run;

data WANT(drop=R);
  set HAVE(rename=(RELREC=R));
  attrib RELREC length=$2 format=$nyu.;
  RELREC=put(R,$convert.);
run;

 

 

A_Kh
Barite | Level 11

Hi @ChrisNZ , 

 

In this example '0', '1', '2' are treated as source values. And they get formatted into "Yes", "No" and "Unable to..". 

 

My problem is the opposite of this. "Yes", "No" and "Unable to.." are source values.. I need to format them into '0', '1' and '2' and fit the string in $2 length. 

 

@Tom 's example works, in one hand, to create a new variable with the length of $2 and conditionally convert RELREC values into "0", "1" and "2" of new variable, and rename it as RELREC. Then apply $NYU. format... But the formatted value is not the original value here. When format removed you cannot see "Yes", "No" and "Unable to.." values...

ChrisNZ
Tourmaline | Level 20

> In this example '0', '1', '2' are treated as source values. And they get formatted into "Yes", "No" and "Unable to..". 

Not at all. The full string is transformed into a new variable containing the digit string, and a format is associated with the new string.

 

data HAVE;
  input SUBJID $8. RELREC $80.;
cards;
A100101 Yes                                                                             
A100102 No                                                                                            
A100103 Unable to adjudicate due to insufficient information
run;

proc format ;
 value $convert  "No"   ="0"
                 "Yes"  ="1"
                 "U"-"V"="2";

 value $nyu (default=57) "0"= "No"
                         "1"= "Yes"
                         "2"= "Unable to Adjudicate due to...." ;
run;

data WANT(drop=R);
  set HAVE(rename=(RELREC=R));
  attrib RELREC length=$2 format=$nyu.;
  RELREC=put(R,$convert.);
run;

proc print; proc contents; run;

Obs SUBJID RELREC
1 A100101 Yes
2 A100102 No
3 A100103 Unable to Adjudicate due to....

 


The CONTENTS Procedure
Data Set Name WORK.WANT Observations 3
Member Type DATA Variables 2
Engine V9 Indexes 0
Created 05/08/2021 09:48:32 Observation Length 10
Last Modified 05/08/2021 09:48:32 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    

 

Engine/Host Dependent Information
Data Set Page Size 131072
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 12943
Obs in First Data Page 3
Number of Data Set Repairs 0
Filename K:\SASWORK\_TD17820_NZ8037SPSAS2003_\Prc2\want.sas7bdat
Release Created 9.0401M2
Host Created X64_SRV12

 

Alphabetic List of Variables and Attributes
# Variable Type Len Format
2 RELREC Char 2 $NYU.
1 SUBJID Char 8  

 

A_Kh
Barite | Level 11

I see this gives required attributes as format and length, not "the formatted value" which is "0", "1" and "2". In your print results we still see original values "Yes", "No" and "Unable..."..

 

I just modified formatting part of your code and see I'm getting both required metadata and value:

proc format;
                   value $nyu (default=57)   "No"= "0"
                                                            "Yes"= "1"
                                                            "Unable to ........."= "2";
run;

 

data WANT(drop=R);
       set HAVE(rename=(RELREC=R));
       attrib RELREC length=$2 format=$nyu. ;
       RELREC=put(R,$nyu.);
run;

 

proc print; proc contents; run;

 

Couldn't paste contents results here...Please check it and give your feedback.  Appreciate your support!

ChrisNZ
Tourmaline | Level 20

I see this gives required attributes as format and length, not "the formatted value" which is "0", "1" and "2". 

You want a length of $2. Therefore the unformatted value can only be 0, 1, 2 and the formatted value is the text.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 4082 views
  • 3 likes
  • 4 in conversation