Changing variable type...what is going wrong

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Changing variable type...what is going wrong

I used a linking software to link two datasets and in the text file I have those observations that were perfect matches. I want to remove those perfect matches in one of the original two datasets so I can look at the the ones that had no matches. I imported the text data but for somereason I could only get it to import using the following:

 

DATA matchedid;

INFILE 'S:\Restricted\HST Safe\McArdle\Linked data\Matched ID.txt' DELIMITER=',' n=32767 firstobs=2;

INPUT Match_Status $ Score $ Class $ Linkage_ID $ ID_File1 $ ID_File2 $;

RUN;

 

Then this is me importing the original data set #2:

 

proc import out=labdata2

file="S:\Restricted\HST Safe\McArdle\Breanna Lab Dataset Versions\exportsasfile"

dbms=xlsx;

run;

 

The first dataset has the ID in character, and the second it is numeric, so I used input to change it. But I keep getting an error (invalid argument) and none of them actual change, they are all just listed at missing. My code is below. I am sure it is something very simple? I've used input many times and have never gotten this!

 

data linkedid;

set matchedid1;

drop=1;

id=input(id_file2, best12.);

drop id_file2;

rename id=id_file2;

run;

 

(setting up merging: just adding this incase it is useful)

data labdata;

set labdata2;

rename id=id_file2;

run;

 

DATA droppinglab ;

MERGE linkedid labdata;

BY id_file2;

RUN;


Accepted Solutions
Solution
‎04-28-2016 04:53 PM
Respected Advisor
Posts: 3,156

Re: Changing variable type...what is going wrong

[ Edited ]

Quick comments:

1. If you try to input those IDs as number, then you won't need to convert it.

INPUT Match_Status $ Score $ Class $ Linkage_ID $ ID_File1  ID_File2 ;

2. During your effort of conversion, you have engaged one of the common difficulties for beginners, eg. the difference between 'Executable statements' and 'Declarative statements'. For your purpose, Rename can't be used like that. For details, please find the proper SAS docs, as far as the code goes, it should look like this:

data linkedid;
set matchedid1 (rename=id_file2=id);
drop=1;
id_file2=input(id, best12.);
drop id;
/*rename id=id_file2;*/
run;

 

 

View solution in original post


All Replies
Solution
‎04-28-2016 04:53 PM
Respected Advisor
Posts: 3,156

Re: Changing variable type...what is going wrong

[ Edited ]

Quick comments:

1. If you try to input those IDs as number, then you won't need to convert it.

INPUT Match_Status $ Score $ Class $ Linkage_ID $ ID_File1  ID_File2 ;

2. During your effort of conversion, you have engaged one of the common difficulties for beginners, eg. the difference between 'Executable statements' and 'Declarative statements'. For your purpose, Rename can't be used like that. For details, please find the proper SAS docs, as far as the code goes, it should look like this:

data linkedid;
set matchedid1 (rename=id_file2=id);
drop=1;
id_file2=input(id, best12.);
drop id;
/*rename id=id_file2;*/
run;

 

 

Trusted Advisor
Posts: 1,116

Re: Changing variable type...what is going wrong

[ Edited ]

Hi @epigrad123,

 

I agree with @Haikuo that reading the numeric IDs into numeric variables (if possible) avoids conversion issues.


However, there is nothing wrong with how you used the DROP and RENAME statements (in the data step creating dataset LINKEDID):

Yes, both are declarative. So, they could even be written in reverse order (which would look illogical) without changing their effect. The DROP statement is always applied before the RENAME statement (see section "Order of Application" in Dropping, Keeping, and Renaming Variables). Turning one of the two statements into an appropriate dataset option on an input or output dataset (as Haikuo suggested) helps to clarify the order of application, though.

 

So, as to your question "what is going wrong," the "invalid argument" issues leading to missing numeric ID_FILE2 values must have had other reasons. There must have been invalid characters contained in the character values ID_File2 read from the .txt file. These could have been examined (if necessary, by means of the $HEXw. format). The default length $8 of ID_File2 might have been too short. A closer look at the .txt file would reveal how it should be imported properly. The option n=32767 of the INFILE statement would most likely be dropped in an optimized version of that data step.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 281 views
  • 2 likes
  • 3 in conversation