i need to specify length on these variables. I am doing it this long way.
DATA SUBMTWO.Casedata29_E;
LENGTH var85 CS_DEMO_AGE 3.;
LENGTH CS_PTR_INV_M 3.;
LENGTH CS_PTR_INV_F 3.;
LENGTH CS_PTR_INV_T 3.;
LENGTH CS_PTR_MAIN_INV 3.;
LENGTH CS_PTR_ANYM_INV 3.;
LENGTH CS_PTR_CAS_INV 3.;
LENGTH CS_GRP_SEX 3.;
LENGTH CS_GRP_SEX_SIZE 3.;
LENGTH CS_STDTEST_INV 3.;
LENGTH CS_UNNAMED 3.;
LENGTH CS_UN_MAIN 3.;
LENGTH CS_UN_ANONY 3.;
LENGTH CS_UN_CASU 3.;
LENGTH CS_UN_SOC 3.;
LENGTH CS_UN_RACE 3.;
LENGTH CS_UN_HISPANIC 3.;
LENGTH CS_UN_OLDER 3.;
LENGTH CS_UN_YOUNGER 3.;
LENGTH CS_UN_MULTISEX 3.;
LENGTH CS_UN_GCCT 3.;
LENGTH CS_UN_GCEXP 3.;
SET SUBMTWO.Casedata29_D;
RUN;
Please tell me the efficient way to do it quickly.
You can have one LENGTH statement with multiple NAME/LENGTH pairs. In fact they are actually nameS/length pairs so you only need to type the number 3 once. (Note that lengths do not need to have a decimal point on them. They can only be integers.)
length
var85 CS_DEMO_AGE CS_PTR_INV_M CS_PTR_INV_F CS_PTR_INV_T
CS_PTR_MAIN_INV CS_PTR_ANYM_INV CS_PTR_CAS_INV CS_GRP_SEX
CS_GRP_SEX_SIZE CS_STDTEST_INV CS_UNNAMED CS_UN_MAIN CS_UN_ANONY
CS_UN_CASU CS_UN_SOC CS_UN_RACE CS_UN_HISPANIC CS_UN_OLDER
CS_UN_YOUNGER CS_UN_MULTISEX CS_UN_GCCT CS_UN_GCEXP
3
;
Since these are numeric variables setting the length to 3 means that when SAS writes the dataset to disk it will drop the 5 least significant bytes of the floating point numbers. This means that you can only accurately represent integer numbers up to a maximum magnitude of 8,192.
Also since you are doing this BEFORE the SET statement it could change the order of the variables in the dataset. Since you are only changing the length of numeric variables you could actually move the LENGTH statement after the SET statement. You cannot use a LENGTH statement to change the length of a character variable after SAS has already added it to the dataset. The reason you can for numeric variables is that during the data step they are all length 8, it is only when it writes it to disk that it truncates the floating point values.
I don't know if you can avoid typing all the variable names or not.
If all of the variables whose name begins with CS should have length 3, then
length var85 cs: 3;
If there are some variables that begin with CS that should not be length three, then you cannot do the above, but you could do
length var85 cs_demo_age cs_ptr_inv_m ... 3;
where you type in all the names in one length statement. Note: there is no period after the number 3 in a length statement.
Presumably, you understand the dangers of using a length of 3 for a numeric variable. You may be truncating the values of non-integer variables, such as CS_STDTEST_INV.
Does your list represent all variable names that begin with CS_ ? Are all these variables part of the incoming data?
If so, move the LENGTH statement to after the SET (permissible for existing numeric variables only, not for character variables):
data SUBMTWO.Casedata29_E;
set SUBMTWO.Casedata29_D;
length var85 CS_: 3;
run;
The list CS_: includes all variable names that begin with CS_
Please dont shout code at us. Use a code window = {i} above post area, and code in lower case using indentations.
data submtwo.casedata29_e; length var85 cs_demo_age cs_: 3.; set submtwo.casedata29_d; run;
Assuming you have no other variables with prefix cs_.
If so then perhaps an array.
Are all the variables sequential in the file, i.e. can you say all variables between first one and last one all set to 3, e.g:
data submtwo.casedata29_e; length var85 cs_demo_age cs_ptr_inv_m--cs_un_gcexp: 3.; set submtwo.casedata29_d; run;
If not then you might be better off creating a list from the metadata. I would ask why you want so many variables however, demo age seems to imply clinical data. Cdisc standards try to have parameter/response approach to data, as its far easier to work with and store and is just a transpose away. For instance:
... PARAM VAL ...
... DEMO_AGE 1 ...
...
With the above, you could transpose the above to get a transposed version like your data, and you could simply apply the format once to val. Its quite ahrd to give exacts as no test data/what you want out, but I find working with normalised data makes life far simpler than transposed datasets (which do have their use, but only really at report time).
@RW9 wrote:
Are all the variables sequential in the file, i.e. can you say all variables between first one and last one all set to 3, e.g:
data submtwo.casedata29_e; length var85 cs_demo_age cs_ptr_inv_m--cs_un_gcexp: 3.; set submtwo.casedata29_d; run;If not then you might be better off creating a list from the metadata. I would ask why you want so many variables however, demo age seems to imply clinical data. Cdisc standards try to have parameter/response approach to data, as its far easier to work with and store and is just a transpose away. For instance:
... PARAM VAL ...... DEMO_AGE 1 ...
...
With the above, you could transpose the above to get a transposed version like your data, and you could simply apply the format once to val. Its quite ahrd to give exacts as no test data/what you want out, but I find working with normalised data makes life far simpler than transposed datasets (which do have their use, but only really at report time).
You do understand that your suggestion cannot work. Double dash and colon variable lists can only work for variables that are defined.
You can have one LENGTH statement with multiple NAME/LENGTH pairs. In fact they are actually nameS/length pairs so you only need to type the number 3 once. (Note that lengths do not need to have a decimal point on them. They can only be integers.)
length
var85 CS_DEMO_AGE CS_PTR_INV_M CS_PTR_INV_F CS_PTR_INV_T
CS_PTR_MAIN_INV CS_PTR_ANYM_INV CS_PTR_CAS_INV CS_GRP_SEX
CS_GRP_SEX_SIZE CS_STDTEST_INV CS_UNNAMED CS_UN_MAIN CS_UN_ANONY
CS_UN_CASU CS_UN_SOC CS_UN_RACE CS_UN_HISPANIC CS_UN_OLDER
CS_UN_YOUNGER CS_UN_MULTISEX CS_UN_GCCT CS_UN_GCEXP
3
;
Since these are numeric variables setting the length to 3 means that when SAS writes the dataset to disk it will drop the 5 least significant bytes of the floating point numbers. This means that you can only accurately represent integer numbers up to a maximum magnitude of 8,192.
Also since you are doing this BEFORE the SET statement it could change the order of the variables in the dataset. Since you are only changing the length of numeric variables you could actually move the LENGTH statement after the SET statement. You cannot use a LENGTH statement to change the length of a character variable after SAS has already added it to the dataset. The reason you can for numeric variables is that during the data step they are all length 8, it is only when it writes it to disk that it truncates the floating point values.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.