BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
OscarUvalle
Obsidian | Level 7

Hi! I've been dealing with this problem two weeks ago or so. It is driving me crazy. I'm importing a CSV with three variables

  • ID, YEAR and SEM where ID represent a location MX,  YEAR the year in 4 digits 2018 and SEM the quarter from 1 to 4. (e.g. MX 2018 1)
  • I am proposing to create a new variable YEARSEM concatenating Year and Sem with a colon between them. (e.g. 2000:1). The problem relies on trying to assign the YEARSEM as YYQC. to be read as time variable by SAS. Many things happen:
    1.  The YEARSEM var is created as string, there is no way to creating as numeric because of the colon,
    2.  Using the FORMAT YYQC. give me empty values.

All of this because I have multiple observations for the same id (e.g. CA, 2000, 1 and CA, 2000, 2) therefore I need to create a new time variable to be used as a TIME-ID for regression and others econometrics related stuff which requires one time period observations for each id. Let me know what would you do. 

 

 

DATA Cross;
	INFILE "Cross.csv" DLM = ',' MISSOVER FIRSTOBS = 2;
	FORMAT ID BEST. ;
	FORMAT YEAR YEAR. ;
	FORMAT SEM COMMA1.;
	FORMAT ROA BEST.;
	FORMAT YEARSEM YYQC.;
	INPUT ID  YEAR  Sem  ROA ;
	YEARSEM = CATX(":",YEAR,SEM);
RUN;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

 

Yes there is a way to create a date variable

     date=intnx('qtr', mdy(1,1,year),sem-1,'begin');

which established a complete date value always assigned to the first day of the specified quarter (i.e. jan1, apr1, jun1, oct1)

 

The just format the date variable as YYQC6.     (the "C" says to put a colon between the 4-digit year and the 1 digit qtr).

 

For filtering you can then match date literals, as in

    where date='01jul2018'd;

 

or quarter membership

    where qtr(date)=3;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
mkeintz
PROC Star

 

Yes there is a way to create a date variable

     date=intnx('qtr', mdy(1,1,year),sem-1,'begin');

which established a complete date value always assigned to the first day of the specified quarter (i.e. jan1, apr1, jun1, oct1)

 

The just format the date variable as YYQC6.     (the "C" says to put a colon between the 4-digit year and the 1 digit qtr).

 

For filtering you can then match date literals, as in

    where date='01jul2018'd;

 

or quarter membership

    where qtr(date)=3;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
OscarUvalle
Obsidian | Level 7
Works perfectly. Although, I don't really understand how it works. Would you mind to explain it? I want to know the logic behind each section. MDY(1,1, YEAR) is a function based on the info I got in my YEAR var but what is 1,1 for?. On the other hand, SEM-1, is my var SEM minus 1 but what does that mean?. Lastly, what is function 'BEGIN' for?
mkeintz
PROC Star

First, just parese the structure of the statement

     date=intnx('qtr', mdy(1,1,year),sem-1,'begin');

 

It's a function (MDY) inside another function (INTNX)

  For the MDY function google search "sas mdy function", and you will see that it is a way to go the date value of Jan 1 of your given year.

 

So then you have, say for year=2012

    date = intnx('qtr','01jan2012'd,sem-1,'begin')

 Now it's time to google  "sas intnx function", which will explain how to start at one date (01jan2012) and establish another date some number of QTRs (or MONTHs, WEEKs, YEARs, etc) after or before 01jan2012.  That will also explain the use of SEM-1, and the argument "BEGIN"

 

Being as the question is restricted to understanding just two functions, it's an excellent context for gaining some familiarity with sas documentation.  Seize the moment.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
OscarUvalle
Obsidian | Level 7
Thank you so much your explanation was clear and concise.
ChrisNZ
Tourmaline | Level 20

I would just add to @mkeintz's perfect suggestions that if you require the colon, you can create a custom format.

See here and here for example.

 

OscarUvalle
Obsidian | Level 7

Thanks for this. I did not know about this functions.

 

ballardw
Super User

@OscarUvalle wrote:

Hi! I've been dealing with this problem two weeks ago or so. It is driving me crazy. I'm importing a CSV with three variables

  • ID, YEAR and SEM where ID represent a location MX,  YEAR the year in 4 digits 2018 and SEM the quarter from 1 to 4. (e.g. MX 2018 1)
  • I am proposing to create a new variable YEARSEM concatenating Year and Sem with a colon between them. (e.g. 2000:1). The problem relies on trying to assign the YEARSEM as YYQC. to be read as time variable by SAS. Many things happen:
    1.  The YEARSEM var is created as string, there is no way to creating as numeric because of the colon,
    2.  Using the FORMAT YYQC. give me empty values.

All of this because I have multiple observations for the same id (e.g. CA, 2000, 1 and CA, 2000, 2) therefore I need to create a new time variable to be used as a TIME-ID for regression and others econometrics related stuff which requires one time period observations for each id. Let me know what would you do. 

 

 

DATA Cross;
	INFILE "Cross.csv" DLM = ',' MISSOVER FIRSTOBS = 2;
	FORMAT ID BEST. ;
	FORMAT YEAR YEAR. ;
	FORMAT SEM COMMA1.;
	FORMAT ROA BEST.;
	FORMAT YEARSEM YYQC.;
	INPUT ID  YEAR  Sem  ROA ;
	YEARSEM = CATX(":",YEAR,SEM);
RUN;

 

 


I am a bit concerned about this statement:

FORMAT YEAR YEAR. ;

 

You do not provide any example of the data but since that will by default show a value of 1965 if the actual value of Year is 2018, which is sort of implied by your narrative description, you may lead to a variety of confusions later when using this data.

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 879 views
  • 4 likes
  • 4 in conversation