Have:
dataset named 'colList':
ColNeed
Col_1
Col_2
Col_3
dataset named 'sample':
Drop 1 Col_2 Col_1 Nope_3
a 2 1 b
Want:
Col_1 Col_2 Col_3
1 2 .
Bascially, in 'sample', drop columns that are not listed in 'colList', add additional columns that listed in 'colList' but not already in 'sample'. Finally, maintain the same order as in 'colList' (from top to bottom).
Question:
Currently struggled to find the optimal codes, though it seems there might be multiple approaches.
Note:
To generate the data needed, please use:
data colList;
input ColNeed $;
cards;
Col_1
Col_2
Col_3
;
run;
data sample;
input 'Drop 1'n $ Col_2 Col_1 Nope_3 $;
cards;
a 2 1 b
;
run;
First, thank you for asking what appeared to be a simple problem, but required quite a bit of thought to come up with a working solution.
As mentioned in my previous post I had thought that you might be confronting a discrepancy between how SAS and WPS function, which there is (though minor in this case), but neither will output variables unless they are explicitly initialized. Interestingly, though, both will accept missing values as an acceptable initialization value.
As such, try the following code. It appears to work correctly on both SAS and WPS and, more importantly, solves your current problem:
options validvarname=any; data colList; input ColNeed $; cards; Col_1 Col_2 Col_3 ; run; data sample; input 'Drop 1'n $ Col_2 Col_1 $ Nope_3 $; cards; a 2 1 b ; run; proc sql noprint; select ColNeed, case type when 'char' then catx(' ','retain',ColNeed,"' ';") else catx(' ','retain',ColNeed,".;") end into :keeps separated by ' ', :cols separated by ' ' from colList a full join (select name,type from dictionary.columns where libname='WORK' and memname='SAMPLE') b on a.ColNeed=b.name where ColNeed is not NULL ; quit; data want (keep=&keeps.); &cols. set sample; run;
Art, CEO, AnalystFinder.com
This is a variation of your earlier problem. The solution is essentially the same.
You can either use a PROC SQL to order vars or look at using a Length or retain in a datastep to maintain order.
I would do it as follows:
proc sql noprint; select ColNeed into: cols separated by ' ' from colList ; quit; data want (keep=&cols.); retain &cols.; set sample; run;
Art, CEO, AnalystFinder.com
Hi Art,
Used exactly the same codes, but seems it's not picking up Col_3.
Codes:
data colList;
input ColNeed $;
cards;
Col_1
Col_2
Col_3
;
run;
data sample;
input 'Drop 1'n $ Col_2 Col_1 Nope_3 $;
cards;
a 2 1 b
;
run;
proc sql noprint;
select ColNeed into: cols separated by ' '
from colList
;
quit;
data want (keep=&cols.); /* this part has some problems */
retain &cols.;
set sample;
run;
Logs:
45 data want (keep=&cols.);
46 retain &cols.;
47 set sample;
48 run;
NOTE: Variable Col_3 is uninitialized.
WARNING: The variable Col_3 in the DROP, KEEP, or RENAME list has never been referenced.
NOTE: There were 1 observations read from the data set WORK.SAMPLE.
NOTE: The data set WORK.WANT has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Thanks!
Post the code and log please.
@ayin wrote:
Please note Col_1 and Col_2 are not necessarily all numeric.
I don't believe that has any bearing on the solutions. Post your code and log that you say does not work.
@ayin if you edit an earlier post as a response that doesn't mean we'll automatically notice this. It also makes it hard to make sense of the thread if things change.
You're correct that RETAIN won't work. Do you have any other way to indicate the variable type or a value that it can be initialized to if the variable is missing from a dataset.
The other standard way to accomplish this is to have a dummy dataset that has all the attributes and then you append the empty table with your 'have' dataset and it will have all the desired traits. But you require more information than you currently have.
options validvarname=any; data colList; input ColNeed $; cards; Col_1 Col_2 Col_3 ; run; data sample; input 'Drop 1'n $ Col_2 Col_1 Nope_3 $; cards; a 2 1 b ; run; %macro vname; proc sql noprint; select ColNeed into: cols separated by ' ' from colList; select count(*) into : n from colList where upcase(ColNeed) not in ( select upcase(name) as name from dictionary.columns where libname='WORK' and memname='SAMPLE'); %if &n ne 0 %then %do; select ColNeed into : name separated by ' ' from colList where upcase(ColNeed) not in ( select upcase(name) as name from dictionary.columns where libname='WORK' and memname='SAMPLE'); %end; quit; data want ; set sample; %if &n ne 0 %then %do; retain &name. .; %end; keep &cols.; run; %mend; %vname
NOTE: The following post isn't correct, but I'm leaving it here to show the train of thought in trying to solve the OP's problem.
I don't know if this is or isn't the problem you're confronting, but a similar issue was just discussed on a different forum yesterday.
The behavior your getting is a current difference between the behaviors of SAS and WPS. Regardless of whether you're using SAS or WPS try the following:
proc sql noprint; select ColNeed into: cols separated by ' ' from colList ; quit; data want (keep=&cols.); retain &cols. .; set sample; run;
While SAS doesn't require it, WPS doesn't add a variable to its PDV unless it has been initialized or set to some value. Interestingly, missing is one of those values.
Art, CEO, AnalystFinder.com
@art297 I had the same behaviour using SAS 9.4 M3
First, thank you for asking what appeared to be a simple problem, but required quite a bit of thought to come up with a working solution.
As mentioned in my previous post I had thought that you might be confronting a discrepancy between how SAS and WPS function, which there is (though minor in this case), but neither will output variables unless they are explicitly initialized. Interestingly, though, both will accept missing values as an acceptable initialization value.
As such, try the following code. It appears to work correctly on both SAS and WPS and, more importantly, solves your current problem:
options validvarname=any; data colList; input ColNeed $; cards; Col_1 Col_2 Col_3 ; run; data sample; input 'Drop 1'n $ Col_2 Col_1 $ Nope_3 $; cards; a 2 1 b ; run; proc sql noprint; select ColNeed, case type when 'char' then catx(' ','retain',ColNeed,"' ';") else catx(' ','retain',ColNeed,".;") end into :keeps separated by ' ', :cols separated by ' ' from colList a full join (select name,type from dictionary.columns where libname='WORK' and memname='SAMPLE') b on a.ColNeed=b.name where ColNeed is not NULL ; quit; data want (keep=&keeps.); &cols. set sample; run;
Art, CEO, AnalystFinder.com
Thank you @art297!
It worked for the sample data listed above. When I tried to implement it on real data, however, all character values are reduced to only have length of 1.
For example, if you test it on
options validvarname=any;
data colList;
input ColNeed $;
cards;
Col_1
Col_2
Col_3
;
run;
data sample;
infile datalines dlm=",";
input 'Drop 1'n $ Col_2 Col_1 $ Nope_3 $;
datalines;
GOOD,3,UNIVERSITY,cc
;
run;
The final output is
Col_1 Col_2 Col_3
U 3 .
All numeric/date values are dealt perfectly.
Please let me know how we can tweak your codes a bit and keep character values to its orginal length?
Just move the retain macro one after the set statement:
data want (keep=&keeps.); set sample; &cols. run;
Art, CEO, AnalystFinder.com
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.