Normalizing time-series data using a table function in SQL

Recently I had to normalize quarterly, monthly, and ad-hoc reported metrics into daily rows in order to do correlative analysis. Below is the function that I came up with. The idea is to generate values that are in line with the trend. This can be used with a union on a table that does not contain the gap data to display a smooth transition that moves the values from each actual increment. This is useful for trading strategies that integrate macro data with technical data. For example, I might want to have a strategy that factors in NYSE margin debt or market sentiment to determine that a hedge position might be useful, while also integrating in technical indicators that suggest an appropriate entry or exit point. Most investment strategy generation programs utilize bar data or statistical files that must match up on a daily basis so intermediate rows must be created with values that follow the trend to fill in the gaps.

create function [dbo].[tdf_GenerateDailyGapValues] (
    @StartDate date,
    @EndDate date,
    @StartValue money,
    @EndValue Money)
— Generate intervening values for missing days based on a calendar

returns @GapValues table
(
    GapDate date,
    GapValue money
)
as begin
    declare @DayCount money = datediff(DAY,@StartDate, @EndDate)
    declare @DayChange money = (@EndValue – @StartValue) / @DayCount
    declare @Counter int = 0
    declare @LastGapValue money = @StartValue
    while @Counter < @DayCount – 1
    begin
        set @LastGapValue = @LastGapValue + @DayChange
        set @Counter = @Counter + 1
        insert into @GapValues
        (GapDate, GapValue)
        values (dateadd(day, @Counter, @StartDate), @LastGapValue)
    end
    return
end

Below is sample output from using the function:

SELECT * FROM dbo.tdf_GenerateDailyGapValues
    (‘2014-01-01′,’2014-02-01’,100.00, 200.00)

GapDate    GapValue
2014-01-02    107.1428
2014-01-03    114.2856
2014-01-04    121.4284
2014-01-05    128.5712
2014-01-06    135.714
2014-01-07    142.8568
2014-01-08    149.9996
2014-01-09    157.1424
2014-01-10    164.2852
2014-01-11    171.428
2014-01-12    178.5708
2014-01-13    185.7136
2014-01-14    192.8564

Here is another example of using this.

One problem with this version of the function is that market data is only relevant to business days. Based on this, below is a more practical version that utilizes a calendar to only output values for business days.

create function [dbo].[tdf_GenerateDailyGapValues] (
    @StartDate date,
    @EndDate date,
    @StartValue money,
    @EndValue Money)
— Generate intervening values for missing days based on a calendar
returns @GapValues table
(
    GapDate date,
    GapValue money
)
as begin
    declare @DayCount money
    select @DayCount = COUNT(*) FROM Olap.TradingDayCalendar dc
        WHERE dc.TradingDate > @StartDate and dc.TradingDate < @EndDate
        AND dc.DayNumber = FLOOR(dc.DayNumber)
    IF @DayCount > 0
    begin
        declare @DayChange money = (@EndValue – @StartValue) / @DayCount
        declare @Counter int = 0
        declare @LastGapValue money = @StartValue
        while @Counter < @DayCount – 1
        begin
            set @Counter = @Counter + 1
            if exists(select 0 from Olap.TradingDayCalendar dc
                where dc.TradingDate = Dateadd(day, @Counter, @StartDate) AND DayNumber = FLOOR(DayNumber))
            begin
                set @LastGapValue = @LastGapValue + @DayChange
                insert into @GapValues (GapDate, GapValue)
                values (dateadd(day, @Counter, @StartDate), @LastGapValue)
            end
        end
    end
    return
end

select * from [dbo].[tdf_GenerateDailyGapValues] (‘2014-01-01′,’2014-01-15’,100.00,200.00)

GapDate    GapValue
2014-01-02    107.1428
2014-01-03    114.2856
2014-01-06    121.4284
2014-01-07    128.5712
2014-01-08    135.714
2014-01-09    142.8568
2014-01-10    149.9996
2014-01-13    157.1424
2014-01-14    164.2852

Here is an example application of this that prorates out PE Ratios for a table that contains P/E Ratio data which is only updated every few days:

CREATE VIEW Export.view_Fundamentals_WithGaps
AS
    SELECT TradingSymbol, MarketDate, PERatio, 0 AS GapRow FROM Load.Fundamentals f
    UNION ALL SELECT f.TradingSymbol, t.GapDate, t.GapVAlue, 1 as GapRow
    FROM load.Fundamentals f
    INNER JOIN Load.Fundamentals f2
        ON    f2.TradingSymbol = f.TradingSymbol
        AND f2.MarketDate = f.PriorMarketDate
    CROSS APPLY dbo.tdf_GenerateDailyGapValues(f.PriorMarketDate, f.MarketDate, f2.PERatio, f.PeRatio) t
GO

SELECT * FROM Export.view_Fundamentals_WithGaps WHERE TradingSymbol = ‘FB’ AND MarketDate > ‘2014-02-01’ ORDER BY  MarketDate

TradingSymbol    MarketDate    PERatio    GapRow
FB    2014-02-03    101.80    0
FB    2014-02-04    101.7825    1
FB    2014-02-05    101.765    1
FB    2014-02-06    101.7475    1
FB    2014-02-10    101.73    0

For the example to work, the prior date must be set for each of the rows in the table that contains the source values. A utility stored procedure can handle this as shown below:

CREATE  procedure [dbo].[util_setFundamental_PriorDate]
    @StartDate DATE = NULL,
    @EndDate DATE = NULL
AS BEGIN
— Does Period Upsert Fixes 1 day at a time
    IF @StartDate IS NULL
        SELECT @StartDate = MIN(MarketDate) FROM Load.Fundamentals
    IF @EndDate IS NULL
        SELECT @EndDAte = MAX(MarketDate) FROM Load.Fundamentals
    SET NOCOUNT ON
    DECLARE DayCursor CURSOR
        FOR SELECT TradingDate from olap.TradingDayCalendar   
            WHERE TradingDate > @StartDate
            ORDER BY TradingDate
    DECLARE @MarketDate DATE
    OPEN DayCursor   
    FETCH DayCursor into @MarketDate
    WHILE @@FETCH_STATUS = 0 AND @MarketDate < @EndDate
    BEGIN
        PRINT ‘Processing Date ‘ + CONVERT(VARCHAR(11), @MarketDate)
        UPDATE f
        SET PriorMarketDate = (SELECT MAX(MarketDate) FROM Load.Fundamentals f2
                                WHERE    f2.TradingSymbol = f.TradingSymbol
                                    AND    f2.MarketDate < f.MarketDate)
        FROM Load.Fundamentals f
        WHERE f.MarketDate = @MarketDate
        FETCH DayCursor into @MarketDate
    END
    CLOSE DayCursor
    DEALLOCATE DayCursor
END

Although this seems inefficient with the cursor, it actually performs decently since there are not that many dates to traverse through. This method also minimizes locking by keeping the number of rows updated to be manageable. I was able to populate the prior market dates using this procedure for 1.4 million rows spanning from July, 2009 through February, 2014 in just under 4 minutes as a one-time fix. Since then, I’ve added it to the stored procedure that does the daily load of the data to just do it for the date loaded which takes less than a second each day.

Advertisements
This entry was posted in SQL Server, SQL Tips and Techniques and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s