## Friday, February 17, 2017

### Monetary base update

There's still no information on the monetary base drift rate, but basically on the same track:

I think we're seeing the "natural" drift rate per this post.

### Generative adversarial networks and information equilibrium

This is kind of a placeholder for some thoughts when I get a bit more time. There was a tweet today about Generative Adversarial Networks (GANs) leading to this Medium post. I immediately retweeted it:
This is very interesting for a couple of reasons. It demonstrates how a simple "price" (the judgments of the discriminator D) can cause the supply distribution (G, the generated distribution) to match up with the demand distribution (R, the real distribution). In this setup, however, the discriminator doesn't aggregate information (as it does in the traditional Hayekian view of the price mechanism), but merely notes that there are differences between G and R (the information equilibrium view). See more here.

As such, this sets up (an abstract) information equilibrium model:

$$ \; D \equiv \frac{dR}{dG} = k \; \frac{R}{G} \; "$$

with D as the detector, except we have a constant information source (R, the real data). I hope to put this in a more formal statement soon. Real markets would also have demand (R) also adjust to the supply (G).

The example shown at the Medium post is exactly the generator attempting to match to a real distribution, which is one way to see information equilibrium operating. Here are the results for the mean and standard deviation of the distribution R:

The other thing I noticed is that there is a long tail towards zero mean in the final result:
Not bad. The left tail is a bit longer than the right, but the skew and kurtosis are, shall we say, evocative of the original Gaussian.

Is this some non-ideal information transfer?

$$ \; D \equiv \frac{dR}{dG} \leq k \; \frac{R}{G} \; "$$

Does this have anything to do with this?

Or this (distribution of stock returns)?

Overall, looks very interesting!

## Thursday, February 16, 2017

### Invariance and deep properties

I have been looking for a good explanation of the physical meaning of the general form of the invariance of the information equilibrium condition (shown by commenter M). The equation:

$$\frac{dA}{dB} = k \; \frac{A}{B}$$

is invariant under transformations of the form:

\begin{align} A \rightarrow & \alpha A^{\gamma}\\ B \rightarrow & \beta B^{\gamma} \end{align}

The physical (well, economic) meaning of this is that the information equilibrium condition is invariant under transformations that leave the ratio of the local (instantaneous) log-linear growth rates of $A$ and $B$ constant. This is because

$$\frac{d}{dt} \log \alpha A^{\gamma} = \gamma \frac{d}{dt} \log A$$

and likewise for $B$. Among other things, this preserves the value of information transfer index which means that the information transfer index is the defining aspect of the information equilibrium relationship.

This is interesting because the IT index determines a "power law" relationship between $A$ and $B$. Power law exponents will often have some deep connection to the underlying degrees of freedom. Some vastly different physical systems will behave the same way when they have the same critical exponents (something called universality). Additionally, $k$ is related to Lyapunov exponents which also represent an important invariant property of some systems.

This is to say that the information equilibrium condition is invariant under a transformation that preserves a key (even defining) property of a variety of dynamical systems.

(This is why physicists pay close attention to symmetries. They often lead to deep insights.)

 From here.

### Qualitative analysis done right, part 2b

John Handley asks via Twitter, "[W]here do models like [Eggertsson and Mehrotra] fit into your view of quantitative, qualitative, and toy models?"

I think my answer would have to be a qualitative model, but an unsatisfying one. The major problem is that it is much too complex. However, a lot of the complexity comes from the "microfoundations" aspects, the result of which is exactly as put by Mean Squared Errors:

Consider the macroeconomist. She constructs a rigorously micro-founded model, grounded purely in representative agents solving intertemporal dynamic optimization problems in a context of strict rational expectations. Then, in a dazzling display of mathematical sophistication, theoretical acuity, and showmanship (some things never change), she derives results and policy implications that are exactly what the IS-LM model has been telling us all along. Crowd -- such as it is -- goes wild.
Except in this case it's the AD-AS model. The IS-LM model is already a decent qualitative model of a macroeconomy when it is in a protracted slump, and what this paper does is essentially reproduce an AD-AS model version of Krugman's zero-bound/liquidity trap modification of the IS-LM model [pdf]. This simple crossing curves (e.g. shown above) are far simpler and tell basically the same story as the "microfounded" model.

The model does meet the requirement of being qualitatively consistent with the data. For example, it is consistent with a flattening Phillips curve:
This illustrates a positive relationship between inflation and output - a classic Phillips curve relationship. The intuition is straightforward: as inflation increases, real wages decrease (as wages are rigid) and hence the firms hire more labor. Note that the degree of rigidity is indexed by the parameter γ. As γ gets closer to 1, the Phillips curve gets flatter ...
This is observed. The model also consists of stochastic processes:
An equilibrium is now defined as set of stochastic processes ...
This is also qualitatively consistent with the data (in fact, pure stochastic processes do rather well at forecasting).

## Wednesday, February 15, 2017

### Behavioral Euler equations and non-ideal information transfer

I read through Noah Smith's question and answer session on Reddit. I liked this quote:
But I doubt the Post-Keynesians will ever create their own thriving academic research program, and will keep on influencing the world mainly through pop writing and polemics. I think they like it that way.
But on a more serious note, Noah linked (again) to a paper [pdf] by Xavier Gabaix about a "behavioral" New Keynesian model. One of the things Gabaix introduces is an "attention parameter" $M$ in the Euler equation as a way of making it conform to reality more.

Let's quote Noah's response to a question about his post on the Euler equation:
An Euler equation is a mathematical relationship between observables - interest rates, consumption, and so on.
There are actually infinite Euler equations, because the equation depends on your assumption about the utility function.
The problem is that standard utility functions don't seem to work. Now, you can always make up a new utility function that makes the Euler equation work when you plug in the observed consumption and interest rate data. Some people call this "utility mining" or "preference mining". One example of this is Epstein-Zin preferences, which have become popular in recent years.
The problem with doing this is that those same preferences might not work in other models. And letting preferences change from model to model is overfitting. So another alternative is to change the constraints - add liquidity constraints, for example. So far, there's lots of empirical evidence that liquidity constraints matter, but very few macro models include them yet.
Another, even more radical alternative is to change the assumptions about how agents maximize. This is what Gabaix does, for example, in a recent paper [linked above] ...
Gabaix comes up with a modified Euler equation (log-linearized):

$$x_{t} = M E_{t} [x_{t+1}] - \sigma \left(i_{t} - E_{t}[\pi_{t+1}] - r_{t}^{n} \right)$$

Now I derived this piece of the New Keynesian model from information equilibrium (and maximum entropy) a few months ago. However, let me do it again (partially because this version has a different definition of $\sigma$ and is written in terms of the output gap).

I start with the maximum entropy condition for intertemporal consumption with a real interest rate $R$ such that:

$$C_{t+1} = C_{t} (1+R_{t})$$

If we have the information equilibrium relationship $Y \rightleftarrows C$ (output and consumption) with IT index $\sigma$, we can say $Y \sim C^{\sigma}$ and therefore, after log-linearizing (and substituting the nominal interest rate and inflation):

\begin{align} \frac{1}{\sigma} y_{t} = & \frac{1}{\sigma} y_{t+1} - r_{t}\\ y_{t} = & y_{t+1} - \sigma r_{t}\\ y_{t} = & y_{t+1} - \sigma \left(i_{t} - E_{t}[\pi_{t+1}] \right)\\ x_{t} = & x_{t+1} - \sigma \left(i_{t} - E_{t}[\pi_{t+1}] - r_{t}^{n}\right) \end{align}

where in the last step we rewrote output in terms of the output gap and deviation from the interest rate $r^{n}$.

You may have noticed that the $E$'s are missing. That's because the derivation was done assuming information equilibrium. As I show here, this means that we should include an "information equilibrium" operator $E_{I}$ (think of it as an expectation of information equilibrium):

$$x_{t} = E_{I} x_{t+1} - \sigma \left(i_{t} - E_{I} \pi_{t+1} - r_{t}^{n} \right)$$

Under conditions of non-ideal information transfer, we'd actually expect that

$$x_{t+1} \leq E_{I} x_{t+1}$$

(you cannot receive more information than is sent). Therefore, in terms of rational expectations (model-consistent expectations), we'd actually have:

$$x_{t+1} = M E_{I} x_{t+1} = M E_{t} E_{I} x_{t+1} = M E_{t} x_{t+1}$$

with $M \leq 1$. Back to Gabaix:
In the Euler equation consumers do not appear to be fully forward looking: M < 1. The literature on the forward guidance puzzle concludes, plausibly I think, that M < 1.
We recover a "behavioral" model that can be understood in terms of non-ideal information transfer.

### Dynamic equilibrium: Australia's unemployment rate

I applied the dynamic equilibrium model to the Australian unemployment rate, and it works out fairly well. However one of the interesting things is that FRED only has data up until February 2015 (as of 15 Feb 2017), so fit and the forecast to 2018 was based on data up until then. This showed a strange feature of steadily rising unemployment starting in 2012 which doesn't necessarily fit with the model. The parameter fit said that it was the middle piece of a broad shock that was ending so that the forecast projected a decline in the unemployment rate through the rest of 2015 and 2016. I then went back and scraped the data from the ABS website up until December 2016 and the forecast does remarkably well [1] (shown in black).

The shock locations are 1982.6, 1991.2, 2009.0 (the global financial crisis), and 2013.5. Although there is no "recession" between 1991 and 2009, there are some fluctuations in the unemployment rate (possibly Scott Sumner's missing mini-recessions?) – I could probably change the bin size on the entropy minimization and the code would recognize those as recessions as well. However as a broad brush view of the Australian economy the four shocks seem sufficient.

I do wonder about the source and explanation of the shock centered at 2013.5 ‒ it appears broader than the typical recession. Possibly a delayed response to the ECB-caused Eurozone recession of 2011-2012?

Footnotes

[1] Instead of being a true "blind" out-of-sample forecast, the data was really just "inconvenient" out-of-sample. [Ha!]

### A bit more on the IT index (technical)

There are a couple of loose ends that need tying up regarding the IT index. One of which is the derivation of the information equilibrium condition (see also the paper) with non-uniform probability distributions. This turns out to be relatively trivial and only involves a change in the IT index formula. The information equilibrium condition is

$$\frac{dA}{dB} = k \; \frac{A}{B}$$

$$k = \frac{\log \sigma_{A}}{\log \sigma_{B}}$$

with $\sigma_A$ and $\sigma_B$ being the number of symbols in the "alphabet" chosen uniformly, we have

$$k = \frac{\sum_{i} p_{i}^{(A)} \log p_{i}^{(A)}}{\sum_{j} p_{j}^{(B)} \log p_{j}^{(B)}}$$

where $p_{i}^{(A)}$ and $p_{j}^{(B)}$ represent the probabilities of the different outcomes. The generalization to continuous distributions is also trivial and is left as an exercise for the reader.

However, while it hasn't come up in any of the models yet, it should be noted that the above definitions imply that $k$ is positive. But it turns out that we can handle negative $k$ by simply using the transformation $B \rightarrow 1/C$ so that:

\begin{align} \frac{dA}{dB} = & - |k| \; \frac{A}{B}\\ -C^{2} \frac{dA}{dC} = & - |k| \; \frac{AC}{1}\\ \frac{dA}{dC} = & |k| \; \frac{A}{C} \end{align}

That is to say an information equilibrium relationship $A \rightleftarrows B$ with a negative IT index is equivalent to the relationship $A \rightleftarrows 1/C$ with a positive index.

## Tuesday, February 14, 2017

### Qualitative economics done right, part 2a

When did insisting on comparing theory to data become anything other than incontrovertible? On my post Qualitative economics done right, part 2, I received some push back against this idea in comments. These comments are similar to comments I've seen elsewhere, and represent a major problem with macroeconomics embodied by the refrain the data rejects "too many good models":
But after about five years of doing likelihood ratio tests on rational expectations models, I recall Bob Lucas and Ed Prescott both telling me that those tests were rejecting too many good models.
The "I"in that case was Tom Sargent. Now my series (here's Part 1) goes into detail about why comparison is necessary even for qualitative models. But let me address a list of arguments I've seen that are used against this fundamental tenet of science.

"It's just curve fitting."

I talked about a different aspect of this here. But the "curve fitting" critique seems to go much deeper than a critique of setting up a linear combination of a family of functions and fitting the coefficients (which does have some usefulness per the link).

Somehow any comparison of a theoretical model to data is dismissed as "curve fitting" under this broader critique. However this fundamentally misunderstands two distinct processes and I think represents a lack of understanding of function spaces. Let's say our data is some function of time d(t). Now because some functions fₖ(t) form complete bases, any function d(t) (with reasonable caveats) can be represented as a vector in that function space:

d(t) = Σₖ cₖ fₖ(t)

An example is a Fourier series, but given some level of acceptable error any finite set of terms {1, t, t², t³, t⁴ ...} can suffice (like a Taylor series, or linear, quadratic, etc regression). In this sense, and only in this sense, is this a valid critique. If you can reproduce any set of data, then you really haven't learned anything. However, as I talk about here, you can constrain the model complexity in an information-theoretic sense.

However, this is not the same situation as saying the data is given by a general function f(t) with parameters a, b, c, ...:

d(t) = f(t|a, b, c, ... )

where the theoretical function f is not a complete basis and where the parameters are fit to the data. This is the case of e.g. Planck's blackbody radiation model or Newton's universal gravity law, and in this case we do learn something. We learn that the theory that results in the function f is right, wrong or approximate.

In the example with Keen's model in Part 2 above, we learn that the model (as constructed) is wrong. This doesn't mean debt does not contribute to macroeconomic fluctuations, but it does mean that Keen's model is not the explanation if it does.

A simply way to put this is that there's a difference between parameter estimation (science) and fitting to a set of functions that comprise a function space (not science).

"It shows how to include X."

In the case of Keen, X = debt. There are two things you need to do in order to show that a theoretical model shows how to include X: it needs to fail to describe the data when X isn't included, and it needs to describe the data better when X is included.

A really easy way to do this with a single model is to take X → α X and fit α to the data (estimate the parameter per the above section on "curve fitting"). If α ≠ 0 (and the result looks qualitatively like the data overall), then you've shown a way to include X.

Keen does not do this. His ansatz for including debt D is Y + dD/dt. It should be Y + α dD/dt.

"It's just a toy model."

Sure that's fine. But toy models nearly always a) perform qualitatively well themselves when compared to data, or b) are  easy versions of much more complex models where the more complex model has been compared to data and performed reasonably well. It's fine if Keen's debt model is a toy model that doesn't perform well against the data, but then where is the model that performs really well that it's based on?

"It just demonstrates a principle."

This is similar to the defense that "it's just a toy model", but somewhat more specific. It is only useful for a model to demonstrate a principle if that principle has been shown to be important in explaining empirical data. Therefore the principle should have performed well when compared to data. I used the example of demonstrating renormalization using a scalar field theory (how it's done in some physics textbooks). This is only useful because a) renormalization was shown to be important in understanding empirical data with quantum electrodynamics (QED), and b) the basic story isn't ruined by going to a scalar field from a theory with spinors and a gauge field.

The key point to understand here is that the empirically inaccurate qualitative model is being used to teach something that has already demonstrated itself empirically. Let's put it this way:

After the churn of theory and data comes up with something that explains empirical reality, you can then produce models that capture the essence of the theory that captures reality. Or more simply: you can only use teaching tools after you have learned something.

In the above example, QED was the empirical success that lead to using scalar field theory to teach renormalization. You can't use Keen's models to teach principles because we haven't learned anything yet. As such, Keen's models are actually evidence against the principle (per the argument in curve fitting above). If you try to construct a theory using some principle and that theory looks nothing like the data, then that is an indication that either a) the principle is wrong or b) the way you constructed the model with the principle is wrong.

## Sunday, February 12, 2017

### Added some models to the repository ...

 Nominal growth rate from the Solow model. Result is in the code repository linked below.

I added the simple labor model (Okun's law), Solow model, and the "Quantity Theory of Labor and Capital" (QTLK) to the GitHub information equilibrium repository:

https://github.com/infotranecon/informationequilibrium

Let me know if you can see these files. They're Mathematica notebooks (made in v10.3).

## Friday, February 10, 2017

### Classical Econophysics

 Figure from Classical Econophysics

That's the title of a book co-authored by Ian Wright (see here) that looks at "statistical equilibrium". Here's Ian (from the link):
The reason we can be a bit more optimistic [about understanding the economy] is that some very simple and elegant models of capitalist macrodynamics exist that do a surprisingly effective job of replicating empirical data. ... I co-authored a book, in 2009, that combined the classical approach to political economy (e.g., Smith, Ricardo, Marx) with the concept of statistical equilibrium more usually found in thermodynamics. A statistical equilibrium, in contrast to a deterministic equilibrium that is normally employed in economic models, is ceaselessly turbulent and changing, yet the distribution of properties over the parts of the system is constant. It’s a much better conceptual approach to modelling a system with a huge number of degrees-of-freedom, like an economy.

I think that this kind of approach is the statistical mechanics to information equilibrium's (generalized) thermodynamics/equations of state. Much like how you can compute the pressure and volume relationship of an idea gas from expectation values and partition functions [e.g. here, pdf], information equilibrium gives general forms those equations of state can take.

I curated a "mini-seminar" of blog posts connecting these ideas, in particular this post. I try to make the point that an economic system "... is ceaselessly turbulent and changing, yet the distribution of properties over the parts of the system is constant." (to quote Ian again). That is key to a point that I also try to make: maybe "economics" as we know it only exists when that distribution is constant. When it undergoes changes (e.g. recessions), we might be lost as physicists (usually) are in dealing with non-equilibrium thermodynamics (which for economics might be analogous to sociology).

PS I also tried to look at some information equilibrium relationships in one of Ian's agent-based models (here, here).