UNCLASSIFIED
1
Paper 1047-2021
SAS® Time Sequence Assessment & Forecasting
(TSAF) on the Canada Income Company (CRA), with COVID impacts
Jason A. Oliver, Senior Compliance Analyst, Canada Income Company (CRA)
ABSTRACT
It might be a recurring theme of this 12 months’s SAS World Discussion board that we’re confronted with extra strain to make use of versatile pondering – not simply important pondering – and in relation to
time sequence Assessment and forecasting (TSAF) in SAS, it is all about “rethinking the curve”.
On the Canada Income Company (CRA) Compliance Packages Department (CPB), we now have grappled with dependable forecasting for macro-level tax variables on a month-to-month foundation, even earlier than the COVID-19 pandemic hit. However now we face a very tough
problem. As with many giant organizations, it isn’t simple to predict what the fallout could also be from such a cataclysm.
In establishing SAS to proper the trajectory, we should be additional cautious about a number of the fallacies in making use of TSAF on this context: the lagged impact for tax revenues realized based mostly on audits of the earlier tax 12 months, the necessity to differentiate common tax restoration
per case from sum of tax restoration (month-to-month), realizing that trade sectors will not be “one dimension suits all”, and accounting for comparatively momentary results of staffing re-
orientation within the conversion to a digital office versus the extra enduring results of enterprise disruptions. With SAS Enterprise Miner’s skills to repeatedly modify forecasts, sub-categorize datapoints by tax workplace or trade sector, and apply lagged
cross-correlation Assessment, we’re suitably geared up with the appropriate instruments and this will present summary learnings for different giant organizations.
INTRODUCTION
The Canada Income Company (CRA) is Canada’s federal tax administration. As with all tax
jurisdictions, the CRA has been challenged to maintain tempo with COVID-19 shocks and
manifestations, which started in March 2020 (the final month of our fiscal 12 months).
Fortuitously, SAS® Enterprise Miner™ has been a useful Help in gauging these impacts.
Enterprise Miner™ features a extremely versatile set of purposeful nodes for configuring and
processing time sequence knowledge. It may possibly decompose time sequence parts similar to seasonality
and pattern, present pattern strains and anticipated forecast inside configurable prediction intervals,
and exhibit advanced correlation analyses.
Whereas this has been of nice profit to the CRA in gauging the trajectory of macro-variables
associated to tax revenues and auditor efficiency, the findings of this analysis paper might
UNCLASSIFIED
2
conceivably be utilized within the summary to giant organizations with process-oriented
features, and never simply to different international tax jurisdictions.
Allow us to present a Glossary of phrases to set the stage:
TSAF: Time Sequence Assessment & Forecasting.
TEBA: tax earned by audit, which is the quantity of tax collectible that’s agreed upon in the middle of a taxpayer audit. It’s in NPV (Web Current Worth).
TAR: the tax-at-risk, which is the quantity that CRA threat assessors arrive at because the precursor to auditing exercise.
C/AR ratio: the ratio of [audit] instances accomplished, to motion requests [submitted]
for help. It’s a tentative measure of auditor productiveness.
Integras: the device utilized by CRA auditors to course of instances.
TIME SERIES FUNCTIONAL NODES & SETUP
In SAS® Enterprise Miner™, you’ve six TSAF nodes within the “Time Sequence” ribbon; however we’re
solely going to make use of 4 of them. Beneath is the Time Sequence ribbon with the purposeful nodes in
Question Assignment:
Determine 1. Time Sequence Useful Nodes
TS Information Preparation: this node means that you can specify fundamental time sequence properties
together with interval, cycle, begin/finish time, and accumulation (i.e. by complete, min or max,
imply, and so on.)
o Beneath, the interval is “computerized”, so we specify “Month” because the interval.
o We will go away the seasonal cycle and begin/finish time as “Default”, as SAS®
Enterprise Miner™ will auto-determine these elements from the information.
o In our case, the information was pre-accumulated in SAS® Enterprise Information™ row-
by-row on a per-month foundation, so we will go away Accumulation = “Complete” (else,
we must set it “Common”).
Determine 2. TS Information Preparation node – fundamental properties
UNCLASSIFIED
three
TS Decomposition: this node means that you can specify comparable fundamental settings to that of
the TS Information Prep node, however the Variety of Intervals might be configured, and furthermore,
you possibly can configure which Export Elements you wish to show.
o By default, it’s going to solely show “Pattern-Cycle” part (=Sure), which is
typically considered probably the most salient one.
o Nonetheless, in our case, we wish to view ALL Elements, so we might set that
worth to “Sure”.
Determine three. TS Decomposition node –properties
TS Correlation: this node means that you can arrange your TSA for autocorrelation Assessment, or
alternatively for CCA (Cross-correlation Assessment). When you choose a kind of strategies,
the opposite one’s properties can be greyed out.
Determine four. TS Correlation node –properties
Each the TS Correlation and TS Decomposition nodes should be preceded by a TS Information
Preparation node (which happens proper after the supply knowledge node).
UNCLASSIFIED
four
TS Exponential Smoothing: this node means that you can conduct forecasting based mostly in your
recognized knowledge; as such, you’d join it to a TS Information Preparation node, not on to
your supply knowledge node.
The interval is computerized (which can be month within the case of our pre-accumulated
knowledge), and the buildup defaults to “Complete” (which is OK in our case, for the
similar motive).
SAS will choose what it deems to be the very best forecasting technique.
The default choice criterion is MSE, or Imply Squared Error.
We are going to see extra on the Forecast lead, again, and significance stage parameters
through the forecast demonstration on this paper.
Determine 5. TS Exponential Smoothing node –properties
For our preliminary workspace setup, we will scrutinize on the C/AR (Case to Motion Request)
ratio, which as per our glossary is a tentative measure of tax auditor efficiency. The
preliminary diagram workspace is named “Aggreg_Integras_27mths”, which runs from January
2018 to March 2020. That is organized this fashion for a motive: as a result of it ends on the month
of the COVID shutdown.
Our dataset title is “TSA_AGGREG_SINGLE_LINE_27MTHS”.
So, after I carry this in, I have to set all variables to Function = “Rejected” besides a) C/AR ratio
and b) my MONTH (Time ID) variable.
Determine 6. Variable Function choice from knowledge supply
UNCLASSIFIED
5
You’d set your variables when you carry the information supply to your diagram (workspace).
Determine 7. TS Information Supply to Diagram circulation
NOTE: I don’t cowl the mechanics behind bringing in a knowledge supply, because the principal focus
is on conducting TSAF in SAS® Enterprise Miner™. All we should be involved with is that
as Information Sources turn out to be out there within the top-left menu, we will drag-and-drop them to our
diagram workspace (that are additionally created by right-clicking ‘Diagrams’ within the left panel).
In inspecting the TS Information Preparation node, it’s pretty easy: we see the recognized trajectory of the C/AR variable, just by right-clicking the node Run Outcomes.
Determine eight. Time Sequence Plot, for C/AR ratio variable
We will see that the C/AR ratio has fallen off as of mid-2018, and continued on a really
gradual downward path. Which implies that case auditors are finishing disproportionately
much less instances to the motion requests they submit for Help, albeit with a seasonal issue and
some rebounding of the trend-line in March 2020.
So, we will scrutinize on the extra particular parts of the time sequence line through the use of a TS
Decomposition node.
UNCLASSIFIED
6
DECOMPOSITION OF TIME SERIES
In operating our TS Decomposition node, and viewing the outcomes, the primary one to look at
is the Seasonal Element Plot. In terms of the C/AR ratio, the seasonal index vary
is between a excessive of about 1.three all the way down to about zero.75.
Determine 9. Seasonal Element Plot, for C/AR ratio variable
In the course of the months of March and December, we see pretty excessive seasonality. That is regular
for the time, for the reason that push to finish instances is increased on the finish of the CRA fiscal 12 months
(March), and ostensibly on the finish of the calendar 12 months, additionally. Auditors are finishing
proportionally extra instances vs. the variety of motion requests they undergo the service
desk. So it’s possible that they’re fulfilling instances that don’t require as many interventions
throughout these months. Even in March 2020, C/AR nonetheless remained excessive – it was
resilient to the preliminary COVID results, on account of being a ratio variable and never an absolute
sum variable.
Within the decomposed outcomes, we will additionally study combinatory parts; as an example, the
Pattern-Cycle Element Plot:
Determine 10. Pattern-Cycle Element Plot, for C/AR ratio variable
UNCLASSIFIED
7
This tells us what we had surmised from the preliminary knowledge preparation, that the sequence has
been on a steadily downwards trajectory. Now in relation to tax-related time sequence
knowledge, there isn’t a actual cycle per se; at finest, it’s an inherited cycle from world financial system
fluctuations. The right definition of cycle in a TSA context shouldn’t be the entity’s operational
lifecycle; somewhat, it refers back to the boom-and-bust enterprise cycles that are largely
unpredictable. Ergo, we’re primarily involved about pattern right here.
Now, if we substitute the Common TEBA (tax earned by audit) variable for C/AR [using the
Data Source node shown in figure 6 earlier], we will see what emerges in our decomposed
time sequence outcomes.
Determine 11. Paneled Element Plots, TS Decomp. for Avg. TEBA
This time, as per the panel graph at bottom-left, we see that our seasonality index is
broader than that of C/AR ratio; it goes from a excessive of about 1.eight to a low of ~zero.7. That is
largely attributable to the heightened pressures in the direction of fiscal year-end to extend
realization of TEBA, which we see in Feb.-March. On the reverse finish, we see somewhat low
seasonality for Could, August, and November.
For the unique sequence plot, bottom-right, the pattern continues progressively upwards with
seasonality readily obvious. Within the trend-cycle part plot, at top-left, we see that the
pattern (with cycle, similar to it’s) is rising steadily upwards however then reaches a digital plateau.
The important thing problem then, has been to resolve and reconcile the anticipated forecast as of March
2020 with the brand new COVID-19 realities.
FORECASTING MACRO TAX VARIABLES
AVERAGE TEBA
We will proceed to judge the anticipated trajectory of the AVG. TEBA variable, on a
month-to-month interval. Recall that this variable is pre-accumulated at knowledge supply.
Once we conduct our forecast, we use the TS Exponential Smoothing node.
UNCLASSIFIED
eight
Determine 12. TS Exponential Smoothing node within the TSAF diagram
We let SAS® choose the very best forecasting technique, in addition to choice criterion (forecast
measure). On this case, the latter worth is the MSE [Mean Squared Error] as you possibly can see at
the underside of the properties of the node.
Determine 13. Properties of the TS Exponential Smoothing node
For our Significance Degree, we set this to zero.5; it governs the blue bracket across the
forecast line, a.okay.a. the prediction interval. So it’s a confidence band of kinds. The way in which this
determine works is the other of what a few of us may know from frequentist confidence
intervals; that’s, the decrease the “alpha” worth, the broader the band (prediction interval) so an
“alpha” of zero.01 would produce a really huge band, and an “alpha” worth = zero.99 could be
just about restricted to only the forecast line itself. So we goal within the center (which truly is
nearer to the define of the pattern line, as this determine is extra “log-like” in its manifestation).
Determine 14. TEBA_NPV_Mean: forecast line from pattern
SAS logically expects the pattern will proceed upwards (whereas sustaining seasonality, of
course) on account of “sequence momentum”. Had we started our time sequence at, say, January 2016
somewhat than Jan. 2018, that momentum might need been extra pronounced. The clichés of
UNCLASSIFIED
9
“future habits is ruled by previous habits” and “you possibly can’t know the place you’re going,
except you understand the place you’ve been” have by no means been more true. Nonetheless, enter COVID-19,
and that may be a entire new wrench within the gears of the tax-auditing equipment.
As for the collection of “Greatest” Forecasting Methodology: you can attempt to experiment with
completely different fashions – there are eight in all, as per basic TSAF science – however I can inform
from the form of the forecast line that it’s based mostly, appropriately, on the Additive Winters
method1. I ascertained this by operating the node with this technique chosen, and the
ensuing graph was equivalent to “finest” technique. Not like the Multiplicative Winters technique,
this forecast line relies on pretty constant seasonal “inverted V” shapes within the curve.
If these inverted V shapes grew to become noticeable bigger (or smaller), then Multiplicative Winters
would possible be the “finest” technique that SAS would auto-select.
Determine 15. Obtainable Forecasting Strategies, properties of TS Exp. Smoothing node
We see that within the ensuing forecast, it predicts forward precisely 12 months. That is the
distinction between the figures of “Forecast Lead” and “Forecast Again” within the properties. We
noticed on the earlier web page that the “Forecast Again” = 6; this acts as our validation partition,
utilizing the final six months of recognized knowledge (i.e. Oct. 2019 to March 2020). So this will get
subtracted from the “Forecast Again” worth of 18 to reach at 12 intervals out. Ideally, you
need your “again” [validation] interval to be between 20-25% of your recognized knowledge, which it’s
out of 27 months; even after we enhance the recognized months to 30, it’s going to nonetheless be 20% of
this.
SUM OF TEBA
Once we run a TSAF experiment on the SUM of TEBA – versus its common – we
understand a drastic distinction within the scale. As a result of TEBA is a sum worth, not a ratio (i.e.
C/AR, or [Average] TEBA/case), it’s merely not as resilient to sudden shocks like COVID-19
– as we are going to later see when adjusting the forecast based mostly on incremental months (April, Could,
June) of recognized values.
1 The essence of the Winters technique is to mix discernible pattern with seasonality.
UNCLASSIFIED
10
Determine 16. TEBA SUM Forecast (post-March 2020)
Notice that the MSE choice criterion (default) graphs a pattern line across the recognized values
(that are represented by the pink dots right here). The SUM TEBA for Feb. 2020 is almost double
what it was for March 2020, as you possibly can see by the comparatively giant separation of the pink dots
from the blue dots (on trendline) for these two months. But SAS® “thinks” that the pattern
will proceed positively, as it’s “COVID-agnostic”.
What can also appear surprising to the reader is that the decrease restrict of the prediction interval
for April 2020 (at ~$674.5M) truly exceeds the precise worth for April 2019, which was
barely under $500 million. It isn’t till the autumn till we see that the midpoint of precise
2019 knowledge approximates the LCL (decrease confidence restrict) of the forecasted band for Sept.
2020. That is ostensibly because of the “optimistic momentum” of the time sequence that I alluded to
earlier.
C/AR RATIO
Subsequent, we swap out the SUM of TEBA for the C/AR ratio, as soon as once more. In forecasting a
comparatively low steady ratio variable similar to C/AR, the prediction interval might be much less
dependable. We have now to look at the midpoint distribution. Whereas the midpoint post-March
2020 tends to be at or above the 10.zero line, that is uncommon for 2019 datapoints.
Determine 17. C/AR ratio Forecast
UNCLASSIFIED
11
I used the Imply Relative Abs. Error because the forecast metric (choice criterion), which I
discovered to be extra acceptable. Regardless, what we see within the actuals for the spring of 2020
is a really low C/AR ratio, telling us that case throughput has suffered because of the
pandemic AND that Motion Requests for Help didn’t decline proportionally; there was nonetheless
an obvious excessive want for motion requests.
FORECASTING AVG. HOURS PER CASE
For forecasting common hours per [audit] case, I made up my mind that the extra splendid Choice
Criterion was “Median Relative Abs. Error”. It doesn’t matter what Choice Criterion I used (or
Significance Degree), the prediction interval nonetheless dipped into the adverse vary. Typically,
that is unavoidable. However then the prediction interval turns into spurious; you possibly can’t have
adverse hours. So we have a tendency to only give attention to the midpoint values on this scenario.
Determine 18. Common hours per case Forecast
We will see that the midpoint goes very subtly upwards for the primary few forecasted factors
(post-March 2020), then sharply up for summer season. Because it seems, this can be a pretty good
approximation of the fact, for the reason that Avg. Hours per case through the center of 2020 is
about 1.5-2.zero instances that of the earlier 12 months. What is particularly pronounced is that the
Common Hours of March 2019 had been solely 6.25, whereas for March 2020, it was 35.44. This
was predicated on an Company policy-induced change; check with the hyperlink and passage under:
https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits- despite-continued-backlog-?email_access=on In March 2020, the CRA introduced that it was suspending the overwhelming majority of audit exercise for a
minimal of 4 weeks, aside from audits involving the very largest taxpayers. This suspension meant
that the CRA ceased requests for info referring to present audits, finalizing present audits, and
issuing reassessments. Additional, deadlines for info or doc requests had been suspended and no
motion was required from taxpayers underneath audit throughout this time. This suspension remained in impact till
June 2020, although audits of small and medium companies didn’t resume till late fall.
That is additionally arguably liable for the “pulse” impact we see in precise Avg. TEBA for July
2020, as per the month-to-month incremental Assessment that comes subsequent.
https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits-despite-continued-backlog-?email_access=on
https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits-despite-continued-backlog-?email_access=on
UNCLASSIFIED
12
INCREMENTAL ALIGNMENT
APRIL 2020, KNOWN VALUES
Now after we add the month of April 2020 to our knowledge (making it 28 mths complete), we might
count on the AVG. TEBA actuals for subsequent months to turn out to be nearer to / inside forecast
vary. For example within the graph cross-section that follows, the forecast for September,
October, and December 2020 turns into extra inside vary of later-known actuals, as soon as we
add April 2020 knowledge. Nonetheless, the July 2020 precise (~$122,000) remains to be above the forecast
band for this incremental dataset’s forecast. This was possible because of the resumption of
normal giant enterprise audit as of June 2020 (see earlier web page article/passage).
Determine 19. Revised AVG. TEBA forecast, incremental inclusion of APRIL 2020
Once more, we sometimes use the measure of MSE [Mean Squared Error] in gauging efficacy or
proximity of a forecast to precise [values]. See the Appendix tables on the finish of this paper
for a breakdown of this Assessment, the place I illustrate month-to-month incremental impact on accuracy
of the final six months of the calendar 12 months (i.e. from July to Dec. 2020).
MAY 2020, KNOWN VALUES
Clearly, the addition of April wasn’t sufficient to proper the trajectory of the increasing “COVID
window”. So in persevering with our Assessment of month-to-month incremental impact, I added Could 2020’s
recognized knowledge and I modified the forecast significance stage from zero.5 to zero.25. However it makes no
distinction: July precise remains to be out of forecast vary. We should merely settle for that July 2020
Avg. TEBA is an irregular worth (~$122Okay), since July 2018 had Avg. TEBA =~$45Okay, and July
2019’s Avg. TEBA was ~$57Okay. It’s clear that this can be a COVID-adjustment spike.
Determine 20. Revised AVG. TEBA forecast, incremental inclusion of MAY 2020
UNCLASSIFIED
13
We will subsequently outline July 2020 as a pulse, or a one-time temporary occasion, that induced a
spike within the amassed time sequence worth for that month. This emphasis on bigger
enterprise for audit whereas suspending SMB audits on the time is additional substantiated by the
incontrovertible fact that in July 2020, there was a mean of 50.75 hrs per case accomplished, which is
extraordinarily excessive. For April, which had a really excessive Common TEBA of $185.5K, the determine was
52.16 common hours per case.
JUNE 2020, KNOWN VALUES
Predictably, for the addition of June 2020, it didn’t enhance the forecast band to incorporate the
precise Avg. TEBA for July. So this strengthens the speculation that July’s worth was a one-time
occasion, or pulse, within the time sequence. It additionally strengthens the speculation that Avg. TEBA was extra
resilient to preliminary COVID-19 transition measures (being a ratio worth, in essence). To wit:
observe under that the April-Could-June line for the unique forecast (left) and precise knowledge
factors (proper) is simply above the $50Okay line, and follows the identical trajectory.
Determine 21. Evaluating Q1 of FY2020-21 forecast vs. precise knowledge factors
In taking MSE and RMSE (R is “root”) measurements for each the as-of-March and as-of-
June forecasts, we solely word a slight enchancment (discount) in that worth. Which additionally
goes to indicate the resilience of this variable, and the “pulse” nature of July’s spike.
MEASURE / as of MONTH MARCH 2020 JUNE 2020
AVG. TEBA (MSE) $ 954,467,257.64 $ 888,454,004.34
RMSE $ 30,894.45 $ 29,806.95
Desk 1. Level-in-time [R]MSE for AVG. TEBA forecast-to-actual: July to Dec. 2020
Consult with the Appendix on the finish of this paper for a extra detailed month-by-month
breakdown of those calculations.
FALLACY: COMPARING SUM OF TEBA SHIFT TO AVG. TEBA CHANGES
TSAF works finest whenever you accumulate knowledge data by common, not by sum complete. If we
tried this train utilizing SUM TEBA monthly, it will not end up very nicely, as a result of sum
totals are instantly impacted by any extreme transition, i.e. auditor work re-arrangements
and momentary audit case coverage on account of COVID-19 fallout as of March 2020.
Evaluating the March 2019-2020 comparability within the following desk, the TEBA_SUM and
Case Rely have dropped considerably in March 2020, but the C/AR ratio has augmented.
UNCLASSIFIED
14
Desk 2. 12 months-over-12 months March comparability, key macro-variables in TSA
Nonetheless, because the staffing scenario has tried to stabilize within the intervening months
(April to June 2020), the C/AR ratio has dropped dramatically. (Not proven in above desk.)
The identical is true for the TEBA/AR sample.
SUM OF TEBA: DRASTIC CHANGE
We now evaluate the SUM TEBA forecast as of March 2020 (left picture) and that of June
2020 recognized knowledge factors (proper picture).
Determine 22. Comparability of SUM of TEBA forecast as of March vs. as of June (2020)
For the primary picture, not one of the actuals of the final six months of 2020 fall within the forecast
band. Whereas, for the second picture, two of the actuals of the final six months (Oct., Nov.)
fall within the forecast band.
Additionally observe how a number of the amassed knowledge factors within the forecast are extra “depressed”
within the latter graph; whereas there’s a discernible peak, it doesn’t fairly have the identical
buoyancy or upwards momentum as the previous graph. (We should be mindful, although,
that that is nonetheless utilizing the MSE technique, i.e. taking a line of finest match, the place the pink dots are
the precise values.)
So, there’s little level in utilizing the MSE to gauge efficacy of the month-to-month adjustment, merely
as a result of the values could be so big (versus these within the Avg. TEBA MSE).
UNCLASSIFIED
15
ADVERSE IMPACTS AND DELAYED EFFECTS
LATENT EFFECTS OF SHOCKS
We’d additionally count on that decrease Avg. TEBA wouldn’t manifest till a lot later within the fiscal
12 months 2020-21, on account of most of 2020 consisting of previous 12 months audits. The graph under covers
recognized Avg. TEBA pattern knowledge factors proper as much as December 2020, the bottom level.
Determine 23. Calendar-year-end (2020) Avg. TEBA; lowest level
This extraordinarily low Common TEBA of ~$32,000 per case may very well be a harbinger of additional
common TEBA decline, however we’d have to watch the final quarter of the fiscal 12 months – January
to March 2020, as soon as out there – and validate that concept. (Then we’d apply an
intervention to the time sequence line.)
By the way, in relation to SUM of TEBA with actuals as much as Dec. 2020, the forecast pattern
line for 2021 is way extra credible, displaying all datapoints as being nicely underneath $1 billion, and
largely underneath $500 million.
INTERVENTIONS
As alluded to earlier than, a TSAF train might use interventions, if the acute or irregular
occasion is thought prematurely (or shortly thereafter). That is an adjustment to the “common”
time sequence, utilizing a “dummy” variable for the interval of remark. On this case research,
we’d suggest an intervention for the SUM of TEBA as of March 2020, and probably for
AVG TEBA as of Dec. 2020. Plus, we’d use a “pulse impact” for July 2020. Nonetheless,
programming an intervention requires SAS® Studio™, which is out of scope for this paper.
Determine 24. Primary denotation of enter variables (interventions) by kind
Lowest precise in three years; Dec. 2020 Avg. TEBA of $32,404
A step would work finest as an intervention (for March 2020 and Dec. 2020), for the reason that pattern line shift is sudden and sustained; it doesn’t occur progressively then return to baseline.
UNCLASSIFIED
16
TS CORRELATION NODE
AUTOCORRELATION
Once we take care of a big seasonal and/or pattern part, we often discover a better
diploma of autocorrelation issue (abbreviated “ACF”). Because the title suggests, that is the
tendency of a variable to self-influence. It is also considered momentum, or
“muscle reminiscence”.
In an analogous vein, when frontline auditing groups are performing nicely, a few of that
momentum carries over from one interval to the subsequent, as they construct “muscle reminiscence” and
are better-equipped to take care of extra making an attempt situations which have [abstract] points in
frequent with latest instances labored on. This presents alternatives for “boilerplate” copying
and pasting of frequent findings from one case to a different, adjusting for specifics, and
accelerating common time to finish in addition to garnering extra common TEBA per case.
Clearly, through the present COVID-19 local weather at this writing, and the embargo of SMB case
audit through the spring 2020 interval, we will count on a few of that momentum to be adversely
impacted – since auditors had been engaged on extra advanced giant enterprise instances total. However
first, allow us to study a baseline from the years 2018-2019, under:
Determine 25. ACF Plot, three key tax-related macro-variables (2018-2019)
From the three variables plotted above, Est. TAR-AI (tax-at-risk – audit subject) has low ACF,
TEBA has reasonably excessive ACF, and Complete [Avg. Case] Hours has very excessive ACF. To wit: at
lag t=5, TEBA reaches the zero line; however Complete Hours remains to be at ACF=zero.45.
By stark distinction, in 2020 (under), the ACF for each Avg. TEBA and Case Hours could be very
weak total. Actually, each drop precipitously on the very outset of 2020, simply previous to
COVID-19.
Determine 26. ACF Plot, similar macro-variables, for 2020
UNCLASSIFIED
17
CCA – CROSS-CORRELATION ANALYSIS
Once we discover lagged results between risk-related variables – on this case, TAR (tax-at-
threat) and TEBA (tax earned by audit) – we might use a CCA plot. We’re additionally contemplating
Complete Hours (on audit instances) right here. The plots under are at t=three months and t=12 months
out, with the influencing variables on the vertical axis, and the influenced variables on the
X-axis. The colour shading is considerably counterintuitive, whereby pink means extra positively
cross-correlated, and blue means much less so. Once more, we set a baseline of expectations utilizing
tax knowledge from 2016 to 2019 (48 months) right here.
Determine 27. CCA Map, at time lags three and 12, key macro-variables
Notice the pronounced distinction in CCA issue: for time lag three, the Estimated TAR has
just about no impact on TEBA or Complete Hours per case (as a result of it’s too shut time-wise), however 12
months out (at proper) it has a really pronounced impact on complete case hours, and a average
impact on TEBA (~22%). Additionally, within the first graph for time lag three, TEBA extremely influences Complete
Hours and to a noticeable diploma vice-versa too. However after we get to 12 months out, Complete
Hours has just about no lagged impact on TEBA, and vice-versa.
If we repeat the experiment from 2018 knowledge as much as 2020 (COVID window) knowledge, evaluating
lagged results of TAR on TEBA for 2020, we discover a very completely different sample at t=three and t=12.
For time lag=three, the very best we get is ~three% affect; for t=12, it’s completely nothing.
Determine 28. CCA Map, at time lags three and 12, inclusive of COVID-19 interval
UNCLASSIFIED
18
SUBSETTED ANALYSIS
INDUSTRY PROFILING ANALYSIS
Utilizing the identical knowledge for CCA, we will subdivide our dataset by trade sector, or NAICS
code. I can set this enter to “Cross ID” within the knowledge supply’s variables listing, then re-run the
circulation. From the TS Information Prep node’s Outcomes, right-click within the Time Sequence Plot and choose
Information Choices. We’ll choose a NAICS code at random. And you may see that it fell on the outset
of COVID, and struggled to regain its footing – but exceeding it by calendar year-end.
Determine 29. Trade Profile (NAICS) subsetting of Avg. TEBA in TS Plot (in 2020)
Notice that when you’ve over 100 categorical values – as within the case of NAICS trade
codes right here – it’s going to solely can help you choose from the primary 100. For my part and
expertise, I favor SAS VIYA in relation to subsetting TSA by key classes.
BY TSO (TAX SERVICES OFFICE)
So allow us to study a subsetting TSA for an under-100 categorical set. I take advantage of the TSO, or Tax
Service Workplace parameter, so once more I set the Case_TSO_ID enter to “Cross ID” on the knowledge
supply node. Then I re-run the circulation and entry the Outcomes.
Determine 30. Tax Companies Workplace (TSO) subsetting of Avg. TEBA in TS Plot (in 2020)
By default, it will show all TSO IDs within the Enter TS Plot; so I’ve to right-click the plot
space and choose “Information Choices” to specify filters (WHERE TSO = 5, 18, or 40). Notice that
whereas all of those TSOs converge at varied factors, within the month of April we discover a very
unusual anomaly: TSO 18 has AVG. TEBA =~ $600Okay, however the different two TSOs have TEBA
just below $10,000. But all three of them re-converge later in 2020.
UNCLASSIFIED
19
CONCLUSION
We have now seen the facility and flexibility of SAS® Enterprise Miner™ for conducting TSAF
workouts. It’s clear that not all macro-variables within the Canada Income Company exhibit the
similar behaviors or resilience at varied factors within the turbulent COVID-19 interval, however a very good
deal of this may be attributed to whether or not they had been pure sum variables, or derived ratio-like
variables. Some disruptions – prompting the insertion of intervention results – had been
ostensibly on account of insurance policies in place to “take the sting off” extra weak enterprise.
Many people may take away summary learnings from this paper, even when such people
will not be employed within the tax sector – as a result of in the long run, it’s all about sustaining a sure
buoyancy of the macro-variables that matter most, to the extent potential – these will not be
simple instances to navigate and we want these adversely impacted probably the most clement journey to
a regained prosperity.
REFERENCES
Sarma, Kattamuri S., PhD. Copyright © 2017. Predictive Modeling with SAS® Enterprise
Miner™: Sensible Options for Enterprise Functions, Third Version. Cary, NC, USA: SAS
Institute, Inc.
ACKNOWLEDGMENTS
I’m grateful to my household for his or her encouragement on this endeavor. I’m additionally grateful to
the quite a few workers of the CRA who had been the viewers in my inside presentation of this
TSAF material. I additionally acknowledge and admit defeat to the spell checker in insisting
on the spelling of “endeavor” as it’s, not prefer it must be as it’s on the house shuttle.
Which, not like CRA time sequence, should be anticipated to observe a recognized trajectory.
RECOMMENDED READING
Milhøj, Anders. Sensible Time Sequence Assessment Utilizing SAS®. Copyright © 2013, SAS
Institute Inc., Cary, NC, USA.
Shumway, Robert H. and Stoffer, David S. Time Sequence Assessment and its Functions. 4th
ed. © Springer Worldwide Publishing AG, 2017, Univ. of California at Davis. Davis,
CA, USA.
Brocklebank, John C., Dickey, David A, and Choi, Bong S. SAS® for Forecasting Time
Sequence. third ed. Copyright © 2018, SAS Institute Inc., Cary, NC, USA.
Svolba, Gerhard. Making use of Information Science: Enterprise Case Research Utilizing SAS®. Copyright
© 2017, SAS Institute Inc., Cary, NC, USA.
CONTACT INFORMATION
Your feedback and questions are valued and inspired. Contact the creator at:
Jason A. Oliver, Senior Compliance Analyst & Information Scientist
Canada Income Company
Jason.oliver@cra-arc.gc.ca
mailto:Jason.oliver@cra-arc.gc.ca
UNCLASSIFIED
20
APPENDIX: TABLES OF ACTUAL-TO-FORECAST ANALYSIS
This accommodates detailed breakdowns of the incremental month-to-month additions of amassed
knowledge to the COVID-19 remark window.
AVERAGE TEBA
This begins with Common TEBA, being topic to each MSE and RMSE (Imply Squared
Error, and Root Imply Squared Error).
At this juncture, between April and Could 2020 recognized knowledge, the MSE / RMSE truly
regresses barely, telling us that we’d as nicely have gone straight to June 2020’s knowledge.
In the long run, this substantiates our earlier findings, that as a result of Common TEBA is in essence
a ratio variable and extra resilient to preliminary COVID window – particularly since it’s predicated
UNCLASSIFIED
21
on audits of previous 12 months’s tax filings – there was no actual near-future profit to forecast
alignment based mostly on incremental month-to-month additions for spring.
C/AR RATIO
This, as soon as once more, is the Instances [Completed] to Motion Requests [Submitted] ratio. Right here I
break down the month-to-month forecast measure, utilizing MSE (no RMSE), of the final six months of
calendar 12 months 2020 and incrementing recognized months from March as much as June. For March to
Could, I embody the spring months not but arrived at in every incremental forecast.
UNCLASSIFIED
22
From including April recognized knowledge, the forecast truly worsens; that is arguably on account of having
been accustomed to excessive C/AR values for therefore lengthy. It isn’t till we add MAY that it turns into
extra practical.
Given this extraordinarily low MSE worth, introduced on by the precise 2.57 C/AR worth of Could, we
have reached the optimum level – as evidenced by including June to recognized values:
CASE HOURS
Lastly, in talking to Hours per [audit] case forecast, I present a condensed Assessment utilizing
a simplified MAE [Mean Absolute Error] criterion.
As of March 2020; forecast of April to Dec. 2020: MAE = 78.52
As of April 2020; forecast of Could to Dec. 2020: MAE = 95.83
As of Could 2020; forecast of June to Dec. 2020: MAE = 107.99
As of June 2020; forecast of July to Dec. 2020: MAE = 71.51
So, all in all, this proved a really tough variable to successfully forecast.
Utilized Sciences
Structure and Design
Biology
Enterprise & Finance
Chemistry
Laptop Science
Geography
Geology
Training
Engineering
English
Environmental science
Spanish
Authorities
Historical past
Human Useful resource Administration
Info Methods
Legislation
Literature
Arithmetic
Nursing
Physics
Political Science
Psychology
Studying
Science
Social Science
Dwelling
Homework Solutions
Weblog
Archive
Tags
Opinions
Contact
twitterfacebook
1
Paper 1047-2021
SAS® Time Sequence Assessment & Forecasting
(TSAF) on the Canada Income Company (CRA), with COVID impacts
Jason A. Oliver, Senior Compliance Analyst, Canada Income Company (CRA)
ABSTRACT
It might be a recurring theme of this 12 months’s SAS World Discussion board that we’re confronted with extra strain to make use of versatile pondering – not simply important pondering – and in relation to
time sequence Assessment and forecasting (TSAF) in SAS, it is all about “rethinking the curve”.
On the Canada Income Company (CRA) Compliance Packages Department (CPB), we now have grappled with dependable forecasting for macro-level tax variables on a month-to-month foundation, even earlier than the COVID-19 pandemic hit. However now we face a very tough
problem. As with many giant organizations, it isn’t simple to predict what the fallout could also be from such a cataclysm.
In establishing SAS to proper the trajectory, we should be additional cautious about a number of the fallacies in making use of TSAF on this context: the lagged impact for tax revenues realized based mostly on audits of the earlier tax 12 months, the necessity to differentiate common tax restoration
per case from sum of tax restoration (month-to-month), realizing that trade sectors will not be “one dimension suits all”, and accounting for comparatively momentary results of staffing re-
orientation within the conversion to a digital office versus the extra enduring results of enterprise disruptions. With SAS Enterprise Miner’s skills to repeatedly modify forecasts, sub-categorize datapoints by tax workplace or trade sector, and apply lagged
cross-correlation Assessment, we’re suitably geared up with the appropriate instruments and this will present summary learnings for different giant organizations.
INTRODUCTION
The Canada Income Company (CRA) is Canada’s federal tax administration. As with all tax
jurisdictions, the CRA has been challenged to maintain tempo with COVID-19 shocks and
manifestations, which started in March 2020 (the final month of our fiscal 12 months).
Fortuitously, SAS® Enterprise Miner™ has been a useful Help in gauging these impacts.
Enterprise Miner™ features a extremely versatile set of purposeful nodes for configuring and
processing time sequence knowledge. It may possibly decompose time sequence parts similar to seasonality
and pattern, present pattern strains and anticipated forecast inside configurable prediction intervals,
and exhibit advanced correlation analyses.
Whereas this has been of nice profit to the CRA in gauging the trajectory of macro-variables
associated to tax revenues and auditor efficiency, the findings of this analysis paper might
UNCLASSIFIED
2
conceivably be utilized within the summary to giant organizations with process-oriented
features, and never simply to different international tax jurisdictions.
Allow us to present a Glossary of phrases to set the stage:
TSAF: Time Sequence Assessment & Forecasting.
TEBA: tax earned by audit, which is the quantity of tax collectible that’s agreed upon in the middle of a taxpayer audit. It’s in NPV (Web Current Worth).
TAR: the tax-at-risk, which is the quantity that CRA threat assessors arrive at because the precursor to auditing exercise.
C/AR ratio: the ratio of [audit] instances accomplished, to motion requests [submitted]
for help. It’s a tentative measure of auditor productiveness.
Integras: the device utilized by CRA auditors to course of instances.
TIME SERIES FUNCTIONAL NODES & SETUP
In SAS® Enterprise Miner™, you’ve six TSAF nodes within the “Time Sequence” ribbon; however we’re
solely going to make use of 4 of them. Beneath is the Time Sequence ribbon with the purposeful nodes in
Question Assignment:
Determine 1. Time Sequence Useful Nodes
TS Information Preparation: this node means that you can specify fundamental time sequence properties
together with interval, cycle, begin/finish time, and accumulation (i.e. by complete, min or max,
imply, and so on.)
o Beneath, the interval is “computerized”, so we specify “Month” because the interval.
o We will go away the seasonal cycle and begin/finish time as “Default”, as SAS®
Enterprise Miner™ will auto-determine these elements from the information.
o In our case, the information was pre-accumulated in SAS® Enterprise Information™ row-
by-row on a per-month foundation, so we will go away Accumulation = “Complete” (else,
we must set it “Common”).
Determine 2. TS Information Preparation node – fundamental properties
UNCLASSIFIED
three
TS Decomposition: this node means that you can specify comparable fundamental settings to that of
the TS Information Prep node, however the Variety of Intervals might be configured, and furthermore,
you possibly can configure which Export Elements you wish to show.
o By default, it’s going to solely show “Pattern-Cycle” part (=Sure), which is
typically considered probably the most salient one.
o Nonetheless, in our case, we wish to view ALL Elements, so we might set that
worth to “Sure”.
Determine three. TS Decomposition node –properties
TS Correlation: this node means that you can arrange your TSA for autocorrelation Assessment, or
alternatively for CCA (Cross-correlation Assessment). When you choose a kind of strategies,
the opposite one’s properties can be greyed out.
Determine four. TS Correlation node –properties
Each the TS Correlation and TS Decomposition nodes should be preceded by a TS Information
Preparation node (which happens proper after the supply knowledge node).
UNCLASSIFIED
four
TS Exponential Smoothing: this node means that you can conduct forecasting based mostly in your
recognized knowledge; as such, you’d join it to a TS Information Preparation node, not on to
your supply knowledge node.
The interval is computerized (which can be month within the case of our pre-accumulated
knowledge), and the buildup defaults to “Complete” (which is OK in our case, for the
similar motive).
SAS will choose what it deems to be the very best forecasting technique.
The default choice criterion is MSE, or Imply Squared Error.
We are going to see extra on the Forecast lead, again, and significance stage parameters
through the forecast demonstration on this paper.
Determine 5. TS Exponential Smoothing node –properties
For our preliminary workspace setup, we will scrutinize on the C/AR (Case to Motion Request)
ratio, which as per our glossary is a tentative measure of tax auditor efficiency. The
preliminary diagram workspace is named “Aggreg_Integras_27mths”, which runs from January
2018 to March 2020. That is organized this fashion for a motive: as a result of it ends on the month
of the COVID shutdown.
Our dataset title is “TSA_AGGREG_SINGLE_LINE_27MTHS”.
So, after I carry this in, I have to set all variables to Function = “Rejected” besides a) C/AR ratio
and b) my MONTH (Time ID) variable.
Determine 6. Variable Function choice from knowledge supply
UNCLASSIFIED
5
You’d set your variables when you carry the information supply to your diagram (workspace).
Determine 7. TS Information Supply to Diagram circulation
NOTE: I don’t cowl the mechanics behind bringing in a knowledge supply, because the principal focus
is on conducting TSAF in SAS® Enterprise Miner™. All we should be involved with is that
as Information Sources turn out to be out there within the top-left menu, we will drag-and-drop them to our
diagram workspace (that are additionally created by right-clicking ‘Diagrams’ within the left panel).
In inspecting the TS Information Preparation node, it’s pretty easy: we see the recognized trajectory of the C/AR variable, just by right-clicking the node Run Outcomes.
Determine eight. Time Sequence Plot, for C/AR ratio variable
We will see that the C/AR ratio has fallen off as of mid-2018, and continued on a really
gradual downward path. Which implies that case auditors are finishing disproportionately
much less instances to the motion requests they submit for Help, albeit with a seasonal issue and
some rebounding of the trend-line in March 2020.
So, we will scrutinize on the extra particular parts of the time sequence line through the use of a TS
Decomposition node.
UNCLASSIFIED
6
DECOMPOSITION OF TIME SERIES
In operating our TS Decomposition node, and viewing the outcomes, the primary one to look at
is the Seasonal Element Plot. In terms of the C/AR ratio, the seasonal index vary
is between a excessive of about 1.three all the way down to about zero.75.
Determine 9. Seasonal Element Plot, for C/AR ratio variable
In the course of the months of March and December, we see pretty excessive seasonality. That is regular
for the time, for the reason that push to finish instances is increased on the finish of the CRA fiscal 12 months
(March), and ostensibly on the finish of the calendar 12 months, additionally. Auditors are finishing
proportionally extra instances vs. the variety of motion requests they undergo the service
desk. So it’s possible that they’re fulfilling instances that don’t require as many interventions
throughout these months. Even in March 2020, C/AR nonetheless remained excessive – it was
resilient to the preliminary COVID results, on account of being a ratio variable and never an absolute
sum variable.
Within the decomposed outcomes, we will additionally study combinatory parts; as an example, the
Pattern-Cycle Element Plot:
Determine 10. Pattern-Cycle Element Plot, for C/AR ratio variable
UNCLASSIFIED
7
This tells us what we had surmised from the preliminary knowledge preparation, that the sequence has
been on a steadily downwards trajectory. Now in relation to tax-related time sequence
knowledge, there isn’t a actual cycle per se; at finest, it’s an inherited cycle from world financial system
fluctuations. The right definition of cycle in a TSA context shouldn’t be the entity’s operational
lifecycle; somewhat, it refers back to the boom-and-bust enterprise cycles that are largely
unpredictable. Ergo, we’re primarily involved about pattern right here.
Now, if we substitute the Common TEBA (tax earned by audit) variable for C/AR [using the
Data Source node shown in figure 6 earlier], we will see what emerges in our decomposed
time sequence outcomes.
Determine 11. Paneled Element Plots, TS Decomp. for Avg. TEBA
This time, as per the panel graph at bottom-left, we see that our seasonality index is
broader than that of C/AR ratio; it goes from a excessive of about 1.eight to a low of ~zero.7. That is
largely attributable to the heightened pressures in the direction of fiscal year-end to extend
realization of TEBA, which we see in Feb.-March. On the reverse finish, we see somewhat low
seasonality for Could, August, and November.
For the unique sequence plot, bottom-right, the pattern continues progressively upwards with
seasonality readily obvious. Within the trend-cycle part plot, at top-left, we see that the
pattern (with cycle, similar to it’s) is rising steadily upwards however then reaches a digital plateau.
The important thing problem then, has been to resolve and reconcile the anticipated forecast as of March
2020 with the brand new COVID-19 realities.
FORECASTING MACRO TAX VARIABLES
AVERAGE TEBA
We will proceed to judge the anticipated trajectory of the AVG. TEBA variable, on a
month-to-month interval. Recall that this variable is pre-accumulated at knowledge supply.
Once we conduct our forecast, we use the TS Exponential Smoothing node.
UNCLASSIFIED
eight
Determine 12. TS Exponential Smoothing node within the TSAF diagram
We let SAS® choose the very best forecasting technique, in addition to choice criterion (forecast
measure). On this case, the latter worth is the MSE [Mean Squared Error] as you possibly can see at
the underside of the properties of the node.
Determine 13. Properties of the TS Exponential Smoothing node
For our Significance Degree, we set this to zero.5; it governs the blue bracket across the
forecast line, a.okay.a. the prediction interval. So it’s a confidence band of kinds. The way in which this
determine works is the other of what a few of us may know from frequentist confidence
intervals; that’s, the decrease the “alpha” worth, the broader the band (prediction interval) so an
“alpha” of zero.01 would produce a really huge band, and an “alpha” worth = zero.99 could be
just about restricted to only the forecast line itself. So we goal within the center (which truly is
nearer to the define of the pattern line, as this determine is extra “log-like” in its manifestation).
Determine 14. TEBA_NPV_Mean: forecast line from pattern
SAS logically expects the pattern will proceed upwards (whereas sustaining seasonality, of
course) on account of “sequence momentum”. Had we started our time sequence at, say, January 2016
somewhat than Jan. 2018, that momentum might need been extra pronounced. The clichés of
UNCLASSIFIED
9
“future habits is ruled by previous habits” and “you possibly can’t know the place you’re going,
except you understand the place you’ve been” have by no means been more true. Nonetheless, enter COVID-19,
and that may be a entire new wrench within the gears of the tax-auditing equipment.
As for the collection of “Greatest” Forecasting Methodology: you can attempt to experiment with
completely different fashions – there are eight in all, as per basic TSAF science – however I can inform
from the form of the forecast line that it’s based mostly, appropriately, on the Additive Winters
method1. I ascertained this by operating the node with this technique chosen, and the
ensuing graph was equivalent to “finest” technique. Not like the Multiplicative Winters technique,
this forecast line relies on pretty constant seasonal “inverted V” shapes within the curve.
If these inverted V shapes grew to become noticeable bigger (or smaller), then Multiplicative Winters
would possible be the “finest” technique that SAS would auto-select.
Determine 15. Obtainable Forecasting Strategies, properties of TS Exp. Smoothing node
We see that within the ensuing forecast, it predicts forward precisely 12 months. That is the
distinction between the figures of “Forecast Lead” and “Forecast Again” within the properties. We
noticed on the earlier web page that the “Forecast Again” = 6; this acts as our validation partition,
utilizing the final six months of recognized knowledge (i.e. Oct. 2019 to March 2020). So this will get
subtracted from the “Forecast Again” worth of 18 to reach at 12 intervals out. Ideally, you
need your “again” [validation] interval to be between 20-25% of your recognized knowledge, which it’s
out of 27 months; even after we enhance the recognized months to 30, it’s going to nonetheless be 20% of
this.
SUM OF TEBA
Once we run a TSAF experiment on the SUM of TEBA – versus its common – we
understand a drastic distinction within the scale. As a result of TEBA is a sum worth, not a ratio (i.e.
C/AR, or [Average] TEBA/case), it’s merely not as resilient to sudden shocks like COVID-19
– as we are going to later see when adjusting the forecast based mostly on incremental months (April, Could,
June) of recognized values.
1 The essence of the Winters technique is to mix discernible pattern with seasonality.
UNCLASSIFIED
10
Determine 16. TEBA SUM Forecast (post-March 2020)
Notice that the MSE choice criterion (default) graphs a pattern line across the recognized values
(that are represented by the pink dots right here). The SUM TEBA for Feb. 2020 is almost double
what it was for March 2020, as you possibly can see by the comparatively giant separation of the pink dots
from the blue dots (on trendline) for these two months. But SAS® “thinks” that the pattern
will proceed positively, as it’s “COVID-agnostic”.
What can also appear surprising to the reader is that the decrease restrict of the prediction interval
for April 2020 (at ~$674.5M) truly exceeds the precise worth for April 2019, which was
barely under $500 million. It isn’t till the autumn till we see that the midpoint of precise
2019 knowledge approximates the LCL (decrease confidence restrict) of the forecasted band for Sept.
2020. That is ostensibly because of the “optimistic momentum” of the time sequence that I alluded to
earlier.
C/AR RATIO
Subsequent, we swap out the SUM of TEBA for the C/AR ratio, as soon as once more. In forecasting a
comparatively low steady ratio variable similar to C/AR, the prediction interval might be much less
dependable. We have now to look at the midpoint distribution. Whereas the midpoint post-March
2020 tends to be at or above the 10.zero line, that is uncommon for 2019 datapoints.
Determine 17. C/AR ratio Forecast
UNCLASSIFIED
11
I used the Imply Relative Abs. Error because the forecast metric (choice criterion), which I
discovered to be extra acceptable. Regardless, what we see within the actuals for the spring of 2020
is a really low C/AR ratio, telling us that case throughput has suffered because of the
pandemic AND that Motion Requests for Help didn’t decline proportionally; there was nonetheless
an obvious excessive want for motion requests.
FORECASTING AVG. HOURS PER CASE
For forecasting common hours per [audit] case, I made up my mind that the extra splendid Choice
Criterion was “Median Relative Abs. Error”. It doesn’t matter what Choice Criterion I used (or
Significance Degree), the prediction interval nonetheless dipped into the adverse vary. Typically,
that is unavoidable. However then the prediction interval turns into spurious; you possibly can’t have
adverse hours. So we have a tendency to only give attention to the midpoint values on this scenario.
Determine 18. Common hours per case Forecast
We will see that the midpoint goes very subtly upwards for the primary few forecasted factors
(post-March 2020), then sharply up for summer season. Because it seems, this can be a pretty good
approximation of the fact, for the reason that Avg. Hours per case through the center of 2020 is
about 1.5-2.zero instances that of the earlier 12 months. What is particularly pronounced is that the
Common Hours of March 2019 had been solely 6.25, whereas for March 2020, it was 35.44. This
was predicated on an Company policy-induced change; check with the hyperlink and passage under:
https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits- despite-continued-backlog-?email_access=on In March 2020, the CRA introduced that it was suspending the overwhelming majority of audit exercise for a
minimal of 4 weeks, aside from audits involving the very largest taxpayers. This suspension meant
that the CRA ceased requests for info referring to present audits, finalizing present audits, and
issuing reassessments. Additional, deadlines for info or doc requests had been suspended and no
motion was required from taxpayers underneath audit throughout this time. This suspension remained in impact till
June 2020, although audits of small and medium companies didn’t resume till late fall.
That is additionally arguably liable for the “pulse” impact we see in precise Avg. TEBA for July
2020, as per the month-to-month incremental Assessment that comes subsequent.
https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits-despite-continued-backlog-?email_access=on
https://www.mondaq.com/canada/audit/1030308/cra-moves-forward-with-international-audits-despite-continued-backlog-?email_access=on
UNCLASSIFIED
12
INCREMENTAL ALIGNMENT
APRIL 2020, KNOWN VALUES
Now after we add the month of April 2020 to our knowledge (making it 28 mths complete), we might
count on the AVG. TEBA actuals for subsequent months to turn out to be nearer to / inside forecast
vary. For example within the graph cross-section that follows, the forecast for September,
October, and December 2020 turns into extra inside vary of later-known actuals, as soon as we
add April 2020 knowledge. Nonetheless, the July 2020 precise (~$122,000) remains to be above the forecast
band for this incremental dataset’s forecast. This was possible because of the resumption of
normal giant enterprise audit as of June 2020 (see earlier web page article/passage).
Determine 19. Revised AVG. TEBA forecast, incremental inclusion of APRIL 2020
Once more, we sometimes use the measure of MSE [Mean Squared Error] in gauging efficacy or
proximity of a forecast to precise [values]. See the Appendix tables on the finish of this paper
for a breakdown of this Assessment, the place I illustrate month-to-month incremental impact on accuracy
of the final six months of the calendar 12 months (i.e. from July to Dec. 2020).
MAY 2020, KNOWN VALUES
Clearly, the addition of April wasn’t sufficient to proper the trajectory of the increasing “COVID
window”. So in persevering with our Assessment of month-to-month incremental impact, I added Could 2020’s
recognized knowledge and I modified the forecast significance stage from zero.5 to zero.25. However it makes no
distinction: July precise remains to be out of forecast vary. We should merely settle for that July 2020
Avg. TEBA is an irregular worth (~$122Okay), since July 2018 had Avg. TEBA =~$45Okay, and July
2019’s Avg. TEBA was ~$57Okay. It’s clear that this can be a COVID-adjustment spike.
Determine 20. Revised AVG. TEBA forecast, incremental inclusion of MAY 2020
UNCLASSIFIED
13
We will subsequently outline July 2020 as a pulse, or a one-time temporary occasion, that induced a
spike within the amassed time sequence worth for that month. This emphasis on bigger
enterprise for audit whereas suspending SMB audits on the time is additional substantiated by the
incontrovertible fact that in July 2020, there was a mean of 50.75 hrs per case accomplished, which is
extraordinarily excessive. For April, which had a really excessive Common TEBA of $185.5K, the determine was
52.16 common hours per case.
JUNE 2020, KNOWN VALUES
Predictably, for the addition of June 2020, it didn’t enhance the forecast band to incorporate the
precise Avg. TEBA for July. So this strengthens the speculation that July’s worth was a one-time
occasion, or pulse, within the time sequence. It additionally strengthens the speculation that Avg. TEBA was extra
resilient to preliminary COVID-19 transition measures (being a ratio worth, in essence). To wit:
observe under that the April-Could-June line for the unique forecast (left) and precise knowledge
factors (proper) is simply above the $50Okay line, and follows the identical trajectory.
Determine 21. Evaluating Q1 of FY2020-21 forecast vs. precise knowledge factors
In taking MSE and RMSE (R is “root”) measurements for each the as-of-March and as-of-
June forecasts, we solely word a slight enchancment (discount) in that worth. Which additionally
goes to indicate the resilience of this variable, and the “pulse” nature of July’s spike.
MEASURE / as of MONTH MARCH 2020 JUNE 2020
AVG. TEBA (MSE) $ 954,467,257.64 $ 888,454,004.34
RMSE $ 30,894.45 $ 29,806.95
Desk 1. Level-in-time [R]MSE for AVG. TEBA forecast-to-actual: July to Dec. 2020
Consult with the Appendix on the finish of this paper for a extra detailed month-by-month
breakdown of those calculations.
FALLACY: COMPARING SUM OF TEBA SHIFT TO AVG. TEBA CHANGES
TSAF works finest whenever you accumulate knowledge data by common, not by sum complete. If we
tried this train utilizing SUM TEBA monthly, it will not end up very nicely, as a result of sum
totals are instantly impacted by any extreme transition, i.e. auditor work re-arrangements
and momentary audit case coverage on account of COVID-19 fallout as of March 2020.
Evaluating the March 2019-2020 comparability within the following desk, the TEBA_SUM and
Case Rely have dropped considerably in March 2020, but the C/AR ratio has augmented.
UNCLASSIFIED
14
Desk 2. 12 months-over-12 months March comparability, key macro-variables in TSA
Nonetheless, because the staffing scenario has tried to stabilize within the intervening months
(April to June 2020), the C/AR ratio has dropped dramatically. (Not proven in above desk.)
The identical is true for the TEBA/AR sample.
SUM OF TEBA: DRASTIC CHANGE
We now evaluate the SUM TEBA forecast as of March 2020 (left picture) and that of June
2020 recognized knowledge factors (proper picture).
Determine 22. Comparability of SUM of TEBA forecast as of March vs. as of June (2020)
For the primary picture, not one of the actuals of the final six months of 2020 fall within the forecast
band. Whereas, for the second picture, two of the actuals of the final six months (Oct., Nov.)
fall within the forecast band.
Additionally observe how a number of the amassed knowledge factors within the forecast are extra “depressed”
within the latter graph; whereas there’s a discernible peak, it doesn’t fairly have the identical
buoyancy or upwards momentum as the previous graph. (We should be mindful, although,
that that is nonetheless utilizing the MSE technique, i.e. taking a line of finest match, the place the pink dots are
the precise values.)
So, there’s little level in utilizing the MSE to gauge efficacy of the month-to-month adjustment, merely
as a result of the values could be so big (versus these within the Avg. TEBA MSE).
UNCLASSIFIED
15
ADVERSE IMPACTS AND DELAYED EFFECTS
LATENT EFFECTS OF SHOCKS
We’d additionally count on that decrease Avg. TEBA wouldn’t manifest till a lot later within the fiscal
12 months 2020-21, on account of most of 2020 consisting of previous 12 months audits. The graph under covers
recognized Avg. TEBA pattern knowledge factors proper as much as December 2020, the bottom level.
Determine 23. Calendar-year-end (2020) Avg. TEBA; lowest level
This extraordinarily low Common TEBA of ~$32,000 per case may very well be a harbinger of additional
common TEBA decline, however we’d have to watch the final quarter of the fiscal 12 months – January
to March 2020, as soon as out there – and validate that concept. (Then we’d apply an
intervention to the time sequence line.)
By the way, in relation to SUM of TEBA with actuals as much as Dec. 2020, the forecast pattern
line for 2021 is way extra credible, displaying all datapoints as being nicely underneath $1 billion, and
largely underneath $500 million.
INTERVENTIONS
As alluded to earlier than, a TSAF train might use interventions, if the acute or irregular
occasion is thought prematurely (or shortly thereafter). That is an adjustment to the “common”
time sequence, utilizing a “dummy” variable for the interval of remark. On this case research,
we’d suggest an intervention for the SUM of TEBA as of March 2020, and probably for
AVG TEBA as of Dec. 2020. Plus, we’d use a “pulse impact” for July 2020. Nonetheless,
programming an intervention requires SAS® Studio™, which is out of scope for this paper.
Determine 24. Primary denotation of enter variables (interventions) by kind
Lowest precise in three years; Dec. 2020 Avg. TEBA of $32,404
A step would work finest as an intervention (for March 2020 and Dec. 2020), for the reason that pattern line shift is sudden and sustained; it doesn’t occur progressively then return to baseline.
UNCLASSIFIED
16
TS CORRELATION NODE
AUTOCORRELATION
Once we take care of a big seasonal and/or pattern part, we often discover a better
diploma of autocorrelation issue (abbreviated “ACF”). Because the title suggests, that is the
tendency of a variable to self-influence. It is also considered momentum, or
“muscle reminiscence”.
In an analogous vein, when frontline auditing groups are performing nicely, a few of that
momentum carries over from one interval to the subsequent, as they construct “muscle reminiscence” and
are better-equipped to take care of extra making an attempt situations which have [abstract] points in
frequent with latest instances labored on. This presents alternatives for “boilerplate” copying
and pasting of frequent findings from one case to a different, adjusting for specifics, and
accelerating common time to finish in addition to garnering extra common TEBA per case.
Clearly, through the present COVID-19 local weather at this writing, and the embargo of SMB case
audit through the spring 2020 interval, we will count on a few of that momentum to be adversely
impacted – since auditors had been engaged on extra advanced giant enterprise instances total. However
first, allow us to study a baseline from the years 2018-2019, under:
Determine 25. ACF Plot, three key tax-related macro-variables (2018-2019)
From the three variables plotted above, Est. TAR-AI (tax-at-risk – audit subject) has low ACF,
TEBA has reasonably excessive ACF, and Complete [Avg. Case] Hours has very excessive ACF. To wit: at
lag t=5, TEBA reaches the zero line; however Complete Hours remains to be at ACF=zero.45.
By stark distinction, in 2020 (under), the ACF for each Avg. TEBA and Case Hours could be very
weak total. Actually, each drop precipitously on the very outset of 2020, simply previous to
COVID-19.
Determine 26. ACF Plot, similar macro-variables, for 2020
UNCLASSIFIED
17
CCA – CROSS-CORRELATION ANALYSIS
Once we discover lagged results between risk-related variables – on this case, TAR (tax-at-
threat) and TEBA (tax earned by audit) – we might use a CCA plot. We’re additionally contemplating
Complete Hours (on audit instances) right here. The plots under are at t=three months and t=12 months
out, with the influencing variables on the vertical axis, and the influenced variables on the
X-axis. The colour shading is considerably counterintuitive, whereby pink means extra positively
cross-correlated, and blue means much less so. Once more, we set a baseline of expectations utilizing
tax knowledge from 2016 to 2019 (48 months) right here.
Determine 27. CCA Map, at time lags three and 12, key macro-variables
Notice the pronounced distinction in CCA issue: for time lag three, the Estimated TAR has
just about no impact on TEBA or Complete Hours per case (as a result of it’s too shut time-wise), however 12
months out (at proper) it has a really pronounced impact on complete case hours, and a average
impact on TEBA (~22%). Additionally, within the first graph for time lag three, TEBA extremely influences Complete
Hours and to a noticeable diploma vice-versa too. However after we get to 12 months out, Complete
Hours has just about no lagged impact on TEBA, and vice-versa.
If we repeat the experiment from 2018 knowledge as much as 2020 (COVID window) knowledge, evaluating
lagged results of TAR on TEBA for 2020, we discover a very completely different sample at t=three and t=12.
For time lag=three, the very best we get is ~three% affect; for t=12, it’s completely nothing.
Determine 28. CCA Map, at time lags three and 12, inclusive of COVID-19 interval
UNCLASSIFIED
18
SUBSETTED ANALYSIS
INDUSTRY PROFILING ANALYSIS
Utilizing the identical knowledge for CCA, we will subdivide our dataset by trade sector, or NAICS
code. I can set this enter to “Cross ID” within the knowledge supply’s variables listing, then re-run the
circulation. From the TS Information Prep node’s Outcomes, right-click within the Time Sequence Plot and choose
Information Choices. We’ll choose a NAICS code at random. And you may see that it fell on the outset
of COVID, and struggled to regain its footing – but exceeding it by calendar year-end.
Determine 29. Trade Profile (NAICS) subsetting of Avg. TEBA in TS Plot (in 2020)
Notice that when you’ve over 100 categorical values – as within the case of NAICS trade
codes right here – it’s going to solely can help you choose from the primary 100. For my part and
expertise, I favor SAS VIYA in relation to subsetting TSA by key classes.
BY TSO (TAX SERVICES OFFICE)
So allow us to study a subsetting TSA for an under-100 categorical set. I take advantage of the TSO, or Tax
Service Workplace parameter, so once more I set the Case_TSO_ID enter to “Cross ID” on the knowledge
supply node. Then I re-run the circulation and entry the Outcomes.
Determine 30. Tax Companies Workplace (TSO) subsetting of Avg. TEBA in TS Plot (in 2020)
By default, it will show all TSO IDs within the Enter TS Plot; so I’ve to right-click the plot
space and choose “Information Choices” to specify filters (WHERE TSO = 5, 18, or 40). Notice that
whereas all of those TSOs converge at varied factors, within the month of April we discover a very
unusual anomaly: TSO 18 has AVG. TEBA =~ $600Okay, however the different two TSOs have TEBA
just below $10,000. But all three of them re-converge later in 2020.
UNCLASSIFIED
19
CONCLUSION
We have now seen the facility and flexibility of SAS® Enterprise Miner™ for conducting TSAF
workouts. It’s clear that not all macro-variables within the Canada Income Company exhibit the
similar behaviors or resilience at varied factors within the turbulent COVID-19 interval, however a very good
deal of this may be attributed to whether or not they had been pure sum variables, or derived ratio-like
variables. Some disruptions – prompting the insertion of intervention results – had been
ostensibly on account of insurance policies in place to “take the sting off” extra weak enterprise.
Many people may take away summary learnings from this paper, even when such people
will not be employed within the tax sector – as a result of in the long run, it’s all about sustaining a sure
buoyancy of the macro-variables that matter most, to the extent potential – these will not be
simple instances to navigate and we want these adversely impacted probably the most clement journey to
a regained prosperity.
REFERENCES
Sarma, Kattamuri S., PhD. Copyright © 2017. Predictive Modeling with SAS® Enterprise
Miner™: Sensible Options for Enterprise Functions, Third Version. Cary, NC, USA: SAS
Institute, Inc.
ACKNOWLEDGMENTS
I’m grateful to my household for his or her encouragement on this endeavor. I’m additionally grateful to
the quite a few workers of the CRA who had been the viewers in my inside presentation of this
TSAF material. I additionally acknowledge and admit defeat to the spell checker in insisting
on the spelling of “endeavor” as it’s, not prefer it must be as it’s on the house shuttle.
Which, not like CRA time sequence, should be anticipated to observe a recognized trajectory.
RECOMMENDED READING
Milhøj, Anders. Sensible Time Sequence Assessment Utilizing SAS®. Copyright © 2013, SAS
Institute Inc., Cary, NC, USA.
Shumway, Robert H. and Stoffer, David S. Time Sequence Assessment and its Functions. 4th
ed. © Springer Worldwide Publishing AG, 2017, Univ. of California at Davis. Davis,
CA, USA.
Brocklebank, John C., Dickey, David A, and Choi, Bong S. SAS® for Forecasting Time
Sequence. third ed. Copyright © 2018, SAS Institute Inc., Cary, NC, USA.
Svolba, Gerhard. Making use of Information Science: Enterprise Case Research Utilizing SAS®. Copyright
© 2017, SAS Institute Inc., Cary, NC, USA.
CONTACT INFORMATION
Your feedback and questions are valued and inspired. Contact the creator at:
Jason A. Oliver, Senior Compliance Analyst & Information Scientist
Canada Income Company
Jason.oliver@cra-arc.gc.ca
mailto:Jason.oliver@cra-arc.gc.ca
UNCLASSIFIED
20
APPENDIX: TABLES OF ACTUAL-TO-FORECAST ANALYSIS
This accommodates detailed breakdowns of the incremental month-to-month additions of amassed
knowledge to the COVID-19 remark window.
AVERAGE TEBA
This begins with Common TEBA, being topic to each MSE and RMSE (Imply Squared
Error, and Root Imply Squared Error).
At this juncture, between April and Could 2020 recognized knowledge, the MSE / RMSE truly
regresses barely, telling us that we’d as nicely have gone straight to June 2020’s knowledge.
In the long run, this substantiates our earlier findings, that as a result of Common TEBA is in essence
a ratio variable and extra resilient to preliminary COVID window – particularly since it’s predicated
UNCLASSIFIED
21
on audits of previous 12 months’s tax filings – there was no actual near-future profit to forecast
alignment based mostly on incremental month-to-month additions for spring.
C/AR RATIO
This, as soon as once more, is the Instances [Completed] to Motion Requests [Submitted] ratio. Right here I
break down the month-to-month forecast measure, utilizing MSE (no RMSE), of the final six months of
calendar 12 months 2020 and incrementing recognized months from March as much as June. For March to
Could, I embody the spring months not but arrived at in every incremental forecast.
UNCLASSIFIED
22
From including April recognized knowledge, the forecast truly worsens; that is arguably on account of having
been accustomed to excessive C/AR values for therefore lengthy. It isn’t till we add MAY that it turns into
extra practical.
Given this extraordinarily low MSE worth, introduced on by the precise 2.57 C/AR worth of Could, we
have reached the optimum level – as evidenced by including June to recognized values:
CASE HOURS
Lastly, in talking to Hours per [audit] case forecast, I present a condensed Assessment utilizing
a simplified MAE [Mean Absolute Error] criterion.
As of March 2020; forecast of April to Dec. 2020: MAE = 78.52
As of April 2020; forecast of Could to Dec. 2020: MAE = 95.83
As of Could 2020; forecast of June to Dec. 2020: MAE = 107.99
As of June 2020; forecast of July to Dec. 2020: MAE = 71.51
So, all in all, this proved a really tough variable to successfully forecast.
Utilized Sciences
Structure and Design
Biology
Enterprise & Finance
Chemistry
Laptop Science
Geography
Geology
Training
Engineering
English
Environmental science
Spanish
Authorities
Historical past
Human Useful resource Administration
Info Methods
Legislation
Literature
Arithmetic
Nursing
Physics
Political Science
Psychology
Studying
Science
Social Science
Dwelling
Homework Solutions
Weblog
Archive
Tags
Opinions
Contact
twitterfacebook