BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
prad001
Obsidian | Level 7

From the below example: 

I need to create a new variable called city1 with all the observations after the value "##$$%%**--".

 

DATA NEW;
INPUT city $;
cards;
u3jZdcVN
0nnmdOz
7MxtrQjCXR
wCKbhZM
##$$%%**--
city_dec
CHICAGO
CHICAGO
ELGIN
CHICAGO
CHICAGO
;
RUN;

Thank you,

Prad

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You still haven't stated the task in a way that fully corresponds to your example.

 

In the example it appears that

  1. You have two series of records that should be merged.  One series (variable CITY1) begins on the second line of data, and the second series (CITY2) begins on the second line of data following the dataline containing 
    ##$$%%**--
  2. You now have also introduced two new lines to be ignored, city_enc and city_dec, presumably for encoded and non-encoded city values.  But you apparently don't want those lines, and you don't even use those lines to guide your variable naming.

In addition your sample DATA step does not correctly store the DATA NEW values, because the length of CITY defaults to $8.  You need to set it to $10 in order to test for the magic line with city="##$$%%**--".

 

Please provide working code.  Help us help you.

 

There are a couple ways to do this.  (1) from NEW, make a second data set (V2 below) containing only the values for CITY2 (i.e. valid city names afte the magic line).  Then merge it with the corresponding values prior to the magic line:

 

 

DATA NEW;
INPUT city $10.;
cards;
city_enc
u3jZdcVN
0nnmdOz
7MxtrQjCXR
wCKbhZM
##$$%%**--
city_dec
CHICAGO
CHICAGO
ELGIN
CHICAGO
RUN;

data v2 (drop=_:) /view=v2;
  set new ;
  retain _after_magic_line;
  if lag(city)='##$$%%**--' then _after_magic_line='YES';
  if _after_magic_line='YES';
run;

data want;
  set new (rename=(city=city1));
  set v2  (rename=(city=city2));
  if _n_>=2;
run;

 

This uses two DATA steps, the first to create the subset of city2 values, the second to "merge" with the city1 values.

 

I use two SET statements in the DATA WANT step instead of a MERGE statement, because using the two SETs will stop the data step at the end of the shorter sequence produced by a SET (when the CITY2 values are exhausted).   Using MERGE would have continued until the longer SET is exhausted.

 

You can also do this in a single DATA step by using two SET statements with one series offset from the other:

 

data want2 (drop=_:);
  set new (rename=(city=city2));
  retain _after_magic_line;
  if lag(city2)='##$$%%**--' then _after_magic_line='YES';
  if _after_magic_line='YES' then set new (rename=(city=city1)); **start at the top**;
  if lag(_after_magic_line)='YES';
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
ballardw
Super User

@prad001 wrote:

From the below example: 

I need to create a new variable called city1 with all the observations after the value "##$$%%**--".

 

DATA NEW;
INPUT city $;
cards;
u3jZdcVN
0nnmdOz
7MxtrQjCXR
wCKbhZM
##$$%%**--
city_dec
CHICAGO
CHICAGO
ELGIN
CHICAGO
CHICAGO
;
RUN;

Thank you,

Prad


Can you show us what you expect final result to be? Does "all the observations" mean all the cities in one variable for one record? All records? Records after that value? Only the same city?

prad001
Obsidian | Level 7

My expected results are:

 

CITY             CITY1
u3jZdcVN	CHICAGO
0nnmdOz	        CHICAGO
7MxtrQjC	ELGIN
wCKbhZM         CHICAGO
ballardw
Super User

Some details:

How is the code supposed to know that "city_dec" does not go into the output? Are there any other values that do not get into the output?

There are 5 observations in your source data after city_dec, why are there only 4 in the output? What is the rule that excludes that instance?

 

I have to assume that your actual data is bit more complex. That means clear rules for inclusion/exclusion are required.

prad001
Obsidian | Level 7

Hi Ballardw,

 

I am sorry.

Let me give you the better example. with what is the situation and what I want...

 

DATA NEW;
INPUT city $;
cards;
city_enc
u3jZdcVN
0nnmdOz
7MxtrQjCXR
wCKbhZM
##$$%%**--
city_dec
CHICAGO
CHICAGO
ELGIN
CHICAGO
;
RUN;

 

 

I want the results to be..

Create two new variables city1 and city2 based out of above city.

city1           city2
u3jZdcVN	CHICAGO
0nnmdOz	        CHICAGO
7MxtrQjC	ELGIN
wCKbhZM         CHICAGO
mkeintz
PROC Star

You still haven't stated the task in a way that fully corresponds to your example.

 

In the example it appears that

  1. You have two series of records that should be merged.  One series (variable CITY1) begins on the second line of data, and the second series (CITY2) begins on the second line of data following the dataline containing 
    ##$$%%**--
  2. You now have also introduced two new lines to be ignored, city_enc and city_dec, presumably for encoded and non-encoded city values.  But you apparently don't want those lines, and you don't even use those lines to guide your variable naming.

In addition your sample DATA step does not correctly store the DATA NEW values, because the length of CITY defaults to $8.  You need to set it to $10 in order to test for the magic line with city="##$$%%**--".

 

Please provide working code.  Help us help you.

 

There are a couple ways to do this.  (1) from NEW, make a second data set (V2 below) containing only the values for CITY2 (i.e. valid city names afte the magic line).  Then merge it with the corresponding values prior to the magic line:

 

 

DATA NEW;
INPUT city $10.;
cards;
city_enc
u3jZdcVN
0nnmdOz
7MxtrQjCXR
wCKbhZM
##$$%%**--
city_dec
CHICAGO
CHICAGO
ELGIN
CHICAGO
RUN;

data v2 (drop=_:) /view=v2;
  set new ;
  retain _after_magic_line;
  if lag(city)='##$$%%**--' then _after_magic_line='YES';
  if _after_magic_line='YES';
run;

data want;
  set new (rename=(city=city1));
  set v2  (rename=(city=city2));
  if _n_>=2;
run;

 

This uses two DATA steps, the first to create the subset of city2 values, the second to "merge" with the city1 values.

 

I use two SET statements in the DATA WANT step instead of a MERGE statement, because using the two SETs will stop the data step at the end of the shorter sequence produced by a SET (when the CITY2 values are exhausted).   Using MERGE would have continued until the longer SET is exhausted.

 

You can also do this in a single DATA step by using two SET statements with one series offset from the other:

 

data want2 (drop=_:);
  set new (rename=(city=city2));
  retain _after_magic_line;
  if lag(city2)='##$$%%**--' then _after_magic_line='YES';
  if _after_magic_line='YES' then set new (rename=(city=city1)); **start at the top**;
  if lag(_after_magic_line)='YES';
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
prad001
Obsidian | Level 7
Works perfect.. Thank you mkeintz.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 683 views
  • 2 likes
  • 3 in conversation