The contemporary context of investment in the United States, part 2: definitions

This post is second in a series on the contemporary state of investment in the US. The first is here. The purpose of this post is to provide some definitions and context. There are three questions: first, what processes does “investment” refer to; second, who invests; and third, what patterns should interest us? To define and identify what investment activity matters (for economic growth and development in the US for the next decade or so), I rely mainly on some of the writings of critical/heterodox economists Hyman Minsky and John Kenneth Galbraith.

What is investment?

There are two common usages of the term “investment”. The first is the buying and selling of shares of companies that are publicly listed on stock exchanges. This set of activities is not the sort of investment I’m talking about in this post. No doubt, the stock market can be a source of capital for companies, which can then be deployed for investment. More often than not, however, the stock market functions as a market for corporate discipline and control (mergers, acquisitions, takeovers). The investment I refer to here is the accumulation of fixed assets (such as capital goods [like machinery], inventory, property, physical structures) for the purposes of generating income.

Let me offer a short digression here. Many would say that the two definitions above reflect a financial definition and then an economic definition of investment. In casual, daily conversation, that might be acceptable. However, if you have read Hyman Minsky then you would be aware that both processes are actually interlinked and co-constitutive. That is, the accumulation of fixed assets happens very much according to what is happening in capital markets (of which stock markets are a sub-set). More specifically, Minsky argues that the financing (particularly when financed through debt) and pricing of capital goods occurs in capital markets. The point is that the capital goods and fixed assets that are used for investment are also financial assets. And because these financial assets are accompanied by ownership claims and are financed, this implies that there are cash flow obligations: the debts that were used to finance capital goods accumulation must be validated. Those obligations are met by splitting off part of the income produced by those assets, either as dividends, interest payments, or perhaps (in dire situations) from the sale of capital goods. To make a long story short, Minsky demonstrates that these arrangements can precipitate a depression in the event that financial asset prices collapse, which he argues is possible if capital goods accumulation is financed in too speculative a manner. Otherwise stated, the capitalist system itself sows the seeds of its own crises. I suggest you read this or this.

Where does this leave our definition of investment? There are at least six elements: it refers to a (1) process of accumulating (2) capital goods, which are simultaneously (3) financial assets that are (4) owned by capitalists/entrepreneurs and (5) might be financed through debt, for the purposes of (6) generating income.

In the next section, I’m briefly going to discuss why (5) and (6) do not always apply but rather depend on who is doing the investing. Nonetheless, I think that the definition here is workable and sufficiently distinguishes this set of activities from the “investing” (asset trading) that commences every day with the ringing of the bell at the New York Stock Exchange.

There are a couple of other points I’d like to add here concerning how investment might influence the larger political-economy. Recently a book was released that contained a number of previously (I believe) unpublished essays by Hyman Minsky (they may have been published in a few academic journals) on the topic of jobs, employment, and welfare. I recommend this one, too, for a general introduction to Minsky’s thought and its application to employment. In the introduction and throughout that book, the reader might notice that the US economy is characterized as observing a “private, high-investment strategy”. I think there are two very important points in that phrase. First, investment is private: it is subject to ownership by individuals, not the state. This point should interest anyone who conducts cross-national comparative research: you cannot compare the investment process in a country like the US, where investment happens by entrepreneurs and households, with a country like China, where capital expenditures are determined largely by committees in state-owned businesses appointed by the (Communist authoritarian) state. It may be a minor point, but I think it is worth bringing up.

The second point is much more important: “high-investment strategy.” What does that imply exactly? The idea is that the US economy in particular generates growth by stimulating investment. That is, more private ownership of capital goods. Such a strategy is executed, in the US at least, by various policies: a tax code that favors capital goods accumulation (incentives for depreciation, tax credits); a favorable business environment (low regulation of business, which decreases the cost of doing business generally); and government contracts that partly underwrite profits in select industries, typically those that require high capital goods consumption (armaments, construction, airlines). Thus “growth” in the US economy is primarily pursued by creating favorable conditions for income generation by businesses.

There are number of problems associated with such a strategy, and I’ll quote the four that are mentioned in that volume on employment by Minsky. These can be found in a summary around kindle location 358-382.

First, a tax code that is geared towards more investment will increase inequality between the owners of capital and labor. For what it is worth, inequality on its own does not necessarily spell economic disaster. For instance, if you have read Thomas Piketty’s book, you’ll know that countries can endure long periods (centuries) of inequality without encountering, say, collapse and ruin. Certainly, there are issues of justice and quality of life that arise from high inequality, but the evidence that inequality might lead to economic and financial crisis is wanting. Politically, I don’t worry about inequality because the President and most American political leaders are not Rockefellers, Morgans, Vanderbilts, Carnegies, Fords, or Hearsts. In other words, the money eventually runs out.

[An aside: one of my favorite economists (Deirdre McCloskey) has recently written a review of Piketty that I suggest everyone read. It’s there on the home page in pdf form.]

Second, the income that accrues to owners of capital can lead to opulent consumption by them and emulative consumption by the masses, leading to inflation. There is a natural experiment here in the period 1964 to 1974 in the US, as employment tightened, defense spending escalated with the War in Vietnam, and the economy enjoyed an investment boom (initiated, by the way, with tax cuts passed during the Kennedy and Johnson administrations).

Third, government spending on defense (contracts to specialized and sophisticated high-technology industries) creates demand for high-skilled, high-wage labor. In turn, this widens the inequality among workers. Again, we can observe the effects of growth in high-technology sectors on wages and skills across occupations by observing the period from 1994 onwards, when the revolution in computer chips and the internet began. This was not totally the result of government spending (in fact, US defense spending dropped off after 1990, leading in part to the recession of 1991), although much of the US advanced technology industry grew out of companies that received defense contracts. Nonetheless, this is a serious problem for worker quality of life and, I would argue, for growth prospects between regions and metropolitan areas.

The final problem Minsky describes of a high private investment strategy is that if the tax code and business environment privilege capital spending, then rising business confidence hence banker optimism will erode lending standards while also increasing the riskiness and speculative nature of investment. The result can be a financial crisis and recession. It is worth noting that the 2008 crisis was not the result of excessive optimism by corporate businesses. Rather, the 2008 crisis was the result of a housing boom and a highly leveraged household and and highly leveraged, speculative, and corrupt financial sector.

At this point, I’ve elaborated enough on what I mean by investment. Now I’m briefly going to outline some key differences between the sources of investment.

Who invests?

I mentioned earlier in my definition of investment that it might be financed through debt and that the purpose of investment is to generate income. I’d like to add some caveats to that with the help of John Kenneth Galbraith.

In a previous post containing links, I had one from the St. Louis Federal Reserve documenting that credit to non-corporate non-financial sectors has remained low following the crisis. That to me was a key indicator of who is able and willing to invest in the current economy. We have at least three major groups here: the corporate sector (AT&T, Wal-Mart, ExxonMobil, DuPont, Microsoft); the household sector (you and me); and the financial sector (banks and lenders big and small). Each of these sectors have very different sources of capital and ways of generating income. The corporate sector, which as you can tell is populated by very large companies that are often structured in what economists would call an oligopolistic market (where they can influence prices of inputs and outputs), gets most of its capital (for the purposes of future investment) from retained earnings. In other words, expansion programs, research and development, development of new products are financed from last year’s profits. These business do not typically take out loans from banks to finance their activities, although they certainly float bonds and stocks as part of their complex capital structures. The main point is that these companies seek to make the supply of capital (like they do for all other strategic costs) a “wholly internal decision” (Galbraith, 2007, 34). Chapters Three and Four in The New Industrial State are the relevant backgrounds for this interpretation.

As such, investment is not necessarily financed through debt. In addition, investment may not happen with the aim of producing income. Galbraith describes that in these large corporate enterprises, the managerial hierarchy is responsible for long-term planning; this includes when and whether to make capital expenditures. The ultimate goal of the corporation, then, is not income or profit generation, but rather the elimination of uncertainty. The corporation seeks stability. The process of investment, we can deduce, will fall in line with how corporate managers attempt to achieve that stability. An oligopolistic market structure allows companies to pursue goals other than the relentless pursuit of profit, contra the neoclassical economics dicta.

In contrast, when American entrepreneurs attempt to launch a business, the financing usually comes from residential mortgages and the purpose is the generation of income (they do not exercise oligopolistic power). A home is an individual’s greatest potential source of capital, save inheritances. Consider how easy it is (excuse me, how easy it used to be) for most people to get a mortgage or to refinance their home; this, too, surely was a reflection of the high private investment strategy (or at least should be treated in policy as such).

Obviously, between the individual entrepreneur and the oligopolistic corporation, there are a lot of businesses that do not have pricing power and that have access to various forms of financing that are not restricted to home mortgages or inheritances. These businesses, which could be termed ‘mid-sized’ and have several hundred employees (let’s say, fewer than 500), and which are typically in manufacturing, transportation, utilities, and related industrial sectors, do enter into debt contracts with banks and other lenders. However, I think that our focus should really fall to entrepreneurs and large corporations. For the former, the reason is that small businesses create a tremendous amount of activity, in terms of jobs and sales. Most small business also fail, so this is a tremendously inefficient set of activities. In the short-term, however, small businesses drive quite a bit of local economic activity while providing a lot of people (not just the small business owners themselves but also the people they hire) with an outlet for social and economic advancement.

In the case of large corporations, the interests of these entities set much of the industrial and social policy in the US, and they are responsible for the bulk of exports and income. Furthermore, municipal and state governments compete quite vigorously over corporations. Local governments offer tax incentives, provide physical infrastructure, train the workforce, etc., partly with the aim of attracting business activity, and therefore tax revenue.

What I’m getting at here is that there is a geography of investment, and two very important features of that geography will be households and large corporations. The financial system is also important in this geography, but these days it is less about the location and activities of banks and more about the location, organization, and prerogatives of special investment funds, like pension or hedge funds. With that, I’ll move on to the final question.

Measuring investment and identifying patterns

I won’t dwell on this point much because instead of telling you what I’m going to do, I may as well just do it. In the next posts in this series, I’m going to present the data I stumbled upon from the Bureau of Labor Statistics for output, savings, and investment. The data is organized by sector, that is, industrial sector following the NAICS codes and also by the divisions suggested above (household, financial, corporate, and others).

The most basic pattern to identify is change over time: which areas demonstrate growth and which demonstrate contraction? There are roughly seven years of annual data, which is not a large sample by any means. There will be some noise.

A subsidiary pattern is relative change. The 2008 crisis marks a point where we can evaluate how credit distress during and after the crisis was distributed between sectors and, consequently, what have been and will be the prospects for investment and therefore economic growth. In other words, we can determine to an extent what sectors are “holding back” growth. Recall that this endeavor was largely touched off by the New York Times article that asked that very question (see previous post). My goal is to further investigate that question.


Meso-level analysis using cluster methods: local economic structure among Texan counties, part II

This post continues from the previous one, presenting the empirical findings from the cluster analytical technique.

The four independent variables I settled on as a way to investigate the local economic structure of Texan counties were: urban size, unemployment (rate), poverty (rate), and the share of oil and gas employment in total employment. This selection captures generalized urban economies that arise from large size as well as diseconomies (unemployment and poverty), and a degree of specialization (in an industry of some importance to the state). Unemployment and poverty, by the way, are very different phenomena. Unemployment does not necessarily imply poverty, and vice-versa. Unemployment refers to civilians of adult age who could be in the labor force but are not (perhaps because they are not qualified for available work, for instance). In contrast, individuals in poverty may be gainfully employed, except that they have fewer opportunities for social advancement, are socially ostracized, and their incomes are so low and posses so few material possessions that they have difficulty feeding and clothing themselves without assistance.

Local economic structure, thus, reveals in broad terms the allocation of economies and diseconomies from urban size and specialization. Once emplaced on a map, we may be able to determine the spatial structure of the economy as well. The rest of the post presents the findings from the k-means cluster analysis.

Cartographic introduction

Before diving into the cluster results, however, it is important to provide a more basic introduction to the geography of Texas. The first map below records the population distribution by county (absolute numbers). Consider that with the exception of the far west, counties in Texas are all roughly the same size. This curious feature makes it easier for us infer population density and hence urbanization from the following map. There are a number of important points to draw out here.

  1. Houston is the largest urban area in Texas, with more four million people living in the core county of Harris; activity spills out from Houston into at least four adjacent counties
  2. Dallas-Fort Worth is the next largest, with two heavily populated counties surrounded by less populated areas, ostensibly suburbs.
  3. Central Texas features San Antonio and Austin as part of a rather long chain of urban areas pointing towards Dallas, with San Antonio at the bottom of the chain, and Austin closer to the middle.
  4. Outside of the more densely populated eastern portion is the south, which include Corpus Christi and Brownsville.
  5. In the far west is El Paso, which is almost as much a part of New Mexico as it is Texas.
  6. Several counties are scattered throughout the panhandle, where there is oil and gas extraction.


I have also included a map of the distribution of the Hispanic population in Texas. I do this in order to head off any attempt to associate the location of Hispanic people and unemployment or poverty. In almost half of the counties in Texas, over fifty pct of the population are of Hispanic descent.


The feature that stands out most in comparing the total population and Hispanic population distributions is that the latter does not cluster in major urban areas. Instead, the main factor is proximity to the border with Mexico. This pattern stands in contrast, for example, to the distribution of black Americans in urban areas in the North during the last mid-century. Urban areas do not appear to be sites of particular concentrations of a given demographic group, in other words.

Finally, I present the results of the cluster technique described in the previous post.


There is quite a bit of information that needs to be parsed here. The first point is to note the frequency of each cluster group and their basic identities. The table below shows the frequency and the mean for each cluster group of the four dependent variables (population, unemployment rate, poverty rate, and share of oil and gas in total employment).

Cluster group Freq. Population Unemp. rate Poverty rate Oil and gas employment share
1 1 4,205,743 5.40% 18.63% 2.85%
2 34 31,413 5.29% 26.49% 1.14%
3 8 171,535 11.16% 32.08% 0.04%
4 25 208,060 5.19% 10.81% 0.94%
5 4 1,773,114 5.10% 17.97% 0.37%
6 58 16,835 3.46% 13.03% 4.63%
7 35 56,846 7.14% 20.75% 0.15%
8 89 39,826 4.83% 18.59% 1.02%
Total 254 100,199 5.14% 18.33% 1.70%

Cluster group 1 is clearly central Houston and cluster group 5 is the central areas of Dallas-Fort. Worth, San Antonio, and Austin. The next smallest group is cluster group 3, with 8 non-urbanized counties located along the border with Mexico. Cluster group 4 is the fourth smallest group, and it seems, based on the map, that these are suburban areas. It seems rather interesting that suburban areas of at least six metropolitan areas can be identified on the basis of the four dependent variables. This indicates there is uniformity in economic structure that relates to an area’s position within the urban hierarchy. All other cluster groups have memberships greater than 34 counties, the largest being the final cluster group (8) with 89 counties.

A glance at the map emphasizes that clustering is a common phenomena for almost all groups. This is especially the case for: group 8 in the central part of the state; group 7 in the eastern and northern parts of the state (a long chain of counties registering as 7 stretch from the border with Louisiana up to Dallas); group 6 in the panhandle and northwest of Dallas; and groups 2 and 3 along the border with Mexico.

For discussion, I think the most pressing question would be what are the differences between the most populated groups (particularly 2, 6, 7, and 8). Referring back to the above table, we can categorize those clusters as such:

  • Group 2 are small, poor areas with some oil and gas activity.
  • Group 6 are even smaller areas with low unemployment, relatively low levels of poverty and a substantial amount of oil and gas activity. Ostensibly, these counties observe an economic specialization in oil and gas extraction, and, being underpopulated, probably draw in skilled migrants for temporary work.
  • Group 7 counties are on average slightly larger than group 2 and 6 counties, but have higher rates of unemployment and poverty than group 2, which might be attributed to the very low presence of oil and gas.
  • Group 8 counties are slightly larger than group 2 counties, with slightly lower levels of unemployment and poverty, but also less oil and gas presence.

Obviously, we quickly encounter the explanatory limits of the four dependent variables. Thankfully, I collected more data than ended up going into the determination of clusters, so it is possible to fill out the analysis a bit more. Specifically, we can present the number of bank offices, deposits per capita, and median income. These indicators can reveal the financial characteristics of the clusters, such as access to financial services and accumulated personal savings. We should not necessarily expect that estimated median income will mirror deposits per capita precisely, for the reason that spending and saving patterns will vary between areas due to cost of living differences in addition to varying access to financial intermediaries. For instance, wealthier areas may hold their savings in 401(k) or brokerage accounts as well as bank deposits, while poorer areas may hold savings in cash. Other areas probably send savings back to Mexico in the form of remittances. So the financial indicators can raise some interesting hypotheses (again, cluster analysis does not manufacture evidence that allow us to make statements of causality; this is purely exploratory).

Cluster group Est. median income Bank offices Deposits per capita
1 51,298 1024 50,241
2 35,197 10 20,600
3 30,274 34 10,220
4 62,475 54 14,240
5 52,579 431 36,130
6 49,825 7 30,421
7 38,919 13 14,364
8 40,613 14 20,078
Total 43,815 27 21,209

I will take each cluster group in turn.

  • Group 1 is central Houston, which as a median income almost 20 pct greater than the Texan average as well as the most number of bank offices and the largest stock of deposits per person, almost equal to estimated median income. I should note anecdotally, by the way, that the major urban areas do not necessarily have greater deposits per capita than rural or other areas. This really is a matter of financial literacy and access, which in turn is structured very differently between regions and demographic groups.
  • Group 2, a relatively poor grouping, has the lowest average median income, the second fewest average number of bank offices, but its average deposits per capita is roughly in line with the Texas average.
  • Group 3, along the Mexico border, which also had the highest rate of poverty, has lowest average median income, and the lowest average deposits per capita. However, it has a high average number of bank offices. One might be tempted to say that the number of bank offices reflects remittance activity, except that most banks in the US do not offer this kind of activity, but rather face a number of non-regulated, pseudo-bank competitors (wire services). So, I’m not sure how to interpret that high number. It may be the case that these banks are unit (stand-alone) banks, unlike the nationally-recognized banking conglomerates, who would be unlikely to locate in such areas anyway, leaving a more competitive banking market.
  • Group 4, the suburbs, have the highest average median income yet one of the lowest levels of deposits per capita. Perhaps this reflects my statement earlier about wealthier households having access to a larger and more sophisticated range of financial products, which would mean they put fewer of their assets in bank accounts. Another alternative is that their wealth is accumulated in material things, such as residential property, fixed assets if they are business owners, and other ‘stuff’. Perhaps they finance their expenditures with debts and mortgages as well, creating financial obligations that draw their money away from savings and into regular interest payments.
  • Group 5, the urban cores, are also high income areas, with an exceptional number of bank offices on average.
  • Group 6, mainly in the panhandle, based on the low readings of poverty and unemployment, appear to be relatively successful rural communities. The median income in the average group 6 county is over ten pct greater than the Texas average, with a high level of savings (deposits per capita). Furthermore, this group contains the lowest number of banks offices, which suggests an uncompetitive but perhaps stable banking system.
  • Groups 7 and 8 are quite similar, featuring lower incomes, deposits per capita, and number of banks than the Texas average.

A final indicator I would like to include is total job creation. Here, I simply took the average quarterly number of jobs created (job gains less job destruction) between 2009Q3 and 2013Q3 for each county. As a point of reference, I also calculated job creation as a share of total employment, to account for urban size.

Cluster group Avg. quarterly job growth Job growth relative to urban size
1 17,484 0.90
2 78 1.15
3 273 0.70
4 556 1.15
5 6,750 0.85
6 87 1.65
7 122 0.74
8 97 0.87
Total 320 1.09

The main point from the job growth figures is that the greatest relative amount of job growth was happening in clusters 2, 4, and 6. Recall that these are, in fact, quite a disparate group: group 2 is one of the poorest in terms of work opportunities and high rates of poverty, while group 6 are major oil and gas centers. Group 4, meanwhile, the suburbs, have comparatively low oil and gas presence but are generally wealthier. It is likely the case that further disaggregation of job creation statistics would be necessary to determine the quality of these jobs.

That concludes my case study using cluster analysis. After completing posts like this, I often sit and wonder to myself, “Now, just exactly why did I do all of this?” The point, for me anyway, is that context is important. Given the major economic and political themes going on in the US right now–fracking, the uneven recovery, the ongoing foreclosure crisis, immigration, too-big-to-fail, the competitiveness of US industry–it is as important to know the location and context of such events in order to further our understanding of why these topics are important, to whom they are important (in the sense that some groups will benefit from certain kinds of economic processes while others will bear the burden of them), and the long-term ramifications. Eventually, I am going to revisit the posts I have written on Texas and the oil and gas industry and try and bring together some of my insights into how that industry is shaping the US economic landscape.

Meso-level analysis using cluster methods: local economic structure among Texan counties, part I

Quantitative analysis is never as neat and tidy as one would like it to be. At least, I’ve found that it is often hard to be graceful when you’re dealing with large amounts of data, with the need to clean it up, transform it, present it, and, finally, interpret it. And, then, of course, there is the issue of documenting your process and method, which can also be tedious. Data management and analysis are hard-to-master art forms.

Especially with geography, where you typically are looking at multiple aspects of a large sample of places, presenting the greatest amount of relevant information in the most convincing way to an audience makes for a messy task. Not only is it messy in the planning and development stages, but this carries on all the way to presenting it. (Life would be so much easier with a word processor like LaTeX, which gives much more control over presentation to the writer, except that so many people still use and edit documents collaboratively, which LaTeX doesn’t make easy).

My purpose here today is to share my workflow process, including its shortcomings and how I would like to improve it, as well as to continue the analysis of Texas, which I seem to be fixated upon at the moment. In this post, I outline my workflow process. In a subsequent post, I will present the data.

The method I describe here is cluster analysis. Cluster analysis is a technique that in theory condenses a tremendous amount of information for easier access, yet in practice is quite messy. Essentially, cluster analysis groups objects based on their similarity. It is mainly an exploratory technique, useful for generating hypotheses. There are many potential algorithms for sorting the observations. In a chapter of my dissertation, I used a hierarchical method, which began with all observations in their own cluster and then aggregated two clusters together at each stage. This process avoids the issue of selecting in advance the number of clusters, and employs ANOVA to determine the optimal distance between clusters with the aim of minimizing the sum of squares of a given pair of clusters. The method strives for maximum internal homogeneity in each cluster based on the dependent variables. Other post-estimation techniques are available to confirm the optimal number of clusters.

Here, I use k-means clustering, which has three steps. First, the number of clusters is specified. Second the location of each cluster is initialized. Third, each object is attributed to the nearest cluster. Convergence is achieved when the algorithm can no longer change the assignment of each observation for an optimal fit. It should be noted that, in this way, k-means clustering can group the same set of observations differently on each independent attempt. That is, I could run the same process today and discover an entirely new membership for each observation. Another main difference between k-means and hierarchical methods is that in the former the number of clusters are designated in advance. There are also differences in post-estimation tests.

Let me now move on to the more mundane aspects.

My workflow process

Here is the process by which I assembled data and performed a cluster analysis.

  • Data collection
    1. I gathered data from four datasets: employment (Quarterly Workforce Indicators,; unemployment (Bureau of Labor Statistics,; poverty and income (Small Area Income & Poverty Estimates, US Census,; and, number of banks and deposits (from FDIC Summary of Deposits,
    2. The geography was Texan counties, of which there are 254 (big sample).
    3. I downloaded the relevant data for the most earliest available date, which differs somewhat between each dataset. Sometimes the data were not all available for all observations, so I retrieved the earliest dataset that had complete data.
      1. Employment data: end of 2012
      2. Population (total size, Hispanic): end of 2012
      3. Unemployment: June 2014
      4. Banking: June 2014
    4. I then merged each of the datasets using STATA and saved this as the master file.
  • Running the cluster analysis
    1. The first step is to standardize the variables, in order to place all variables on the same scale and to prevent distortions.
    2. There was a long trial-and-error process here, where I would select various combinations of variables and then perform post-estimation tests and create maps to determine what worked best. Bear in mind, I am trying to sort counties according to local economic structure, which explains why I initially included different sectors. To make a long story short, I ultimately prevailed upon four variables: total population; the unemployment rate; the poverty rate (all ages); and, the share of oil and gas employment in total employment (all variables were standardized). Often, the maps produced from each iteration would just not seem right, so I would return to the drawing board and repeat the process. Scrutinizing a map may not sound particularly scientific, but the bigger problem is that there is no hard rule about what constitutes an appropriate threshold for the post-estimation tests, so my subjective knowledge is as valid as a seemingly objective mathematical test.
    3. For each set of variables, I would always run the analysis with at least two and at most 13 cluster groups. I would then invoke a cluster stop rule, which produces an F-index. When comparing the F-index values for each cluster, the point is to locate the highest values. You would then gather the cluster groups with the greatest F-index values and look at the frequency with which observations appear in each group (for example, important questions here are: does one group have almost all the observations, or are there many groups with only one observation?). Then you could summarize the means of the dependent variables for each cluster and see how different they are.
    4. Then, if you really wanted to be rigorous, you could run ANOVA tests with the cluster group identifier as the independent variable regressed on each of the dependent variables (population, unemployment, poverty, oil/gas). A statistically-significant F-test in ANOVA can tell us whether or not the clusters are distinct from each other. However, again, there are no tools that have been recognized by scientific consensus as being the most accurate or reliable in determining the number of clusters. Some users of cluster analysis would pursue further statistical techniques, whereas my choice is to generate maps and utilize my knowledge of geography to gauge reliability.
    5. I will justify the selection of the four variables in the next post.
  • Creating tables and maps to present the information
    1. The final cluster analysis I performed identified eight cluster groups for the four dependent variables. I created several tables:
      1. One to show the mean values of the dependent variables, as well as number of counties, for each cluster group.
      2. Another to show the mean values on a set of variables not used in constructing the cluster schematic, partly as a further test of robustness but also as a means of analysis.
      3. And finally, a table showing the average quarterly change in number of jobs (not employment; this is the flow measure for ‘job creation’)  from 2009Q3 to 2013Q3. Again, including this data allows for a comparative analysis for hypothesis-testing.
    2. I exported the county identifiers as well as their cluster group identities into a .csv file and then converted that file into a .dbf file. This step is necessary for correctly adding the county cluster identities to a shapefile for Texas counties. I combined the .dbf file with the shapefile using QGIS and then moved the new shapefile over to TileMill, where I tinkered with the color scheme and added a legend (TileMill is less clunky than QGIS).

This is the process by which data was collected, tuned up, and applied for the purpose of cluster analysis. In the next section, I present the results and then advance some hypotheses and discuss some thoughts of mine on the workflow process and also the empirical findings.

Employment update: losses and gains since 2007 recession

In yesterday’s “2:00 pm Water Cooler” links at nakedcapitalism blog (, Lambert Strether of Corrente blog ( included a Bloomberg map (from September of this year) of the recovery of employment by state since 2007 ( Specifically, Bloomberg writers purported to depict the ‘uneven recovery in states post-recession’ by showing the ‘percentage difference between a state’s maximum employment in 2014 and its recession high (reached between December 2007 and June 2009).” They sought to highlight only those states where employment remained below peak levels during the recession, coded using a couple shades of red.

As Strether pointed out, it is a rather confusing map, and I don’t think it conveyed the information in the best way. Why construct a choropleth map that only color-codes poorly performing areas? Why focus on individual peaks in employment during the recession? It would make more sense to depict cartographically the employment changes for all the states and to select a uniform starting date.

Being convalescent following my recent surgery means I have plenty of time to create some terrible cartography of my own. My topic here is total employment changes at the state level from 2007 to 2013. Descriptively, the question is: which states bore the brunt of the recession in terms of employment losses and which have experienced a recovery in employment. Before going right for the States, I start by presenting the ratio of employment in December 2013 to employment in December 2007 for the Census divisions (of which there are nine). [All data was drawn from the Bureau of Labor Statistics (


Recovery can be interpreted in a couple ways. First, it may refer to replacing lost activity, to the point that employment levels in 2013 equal those in 2007. Alternatively, recovery may refer to a kind of resilience. This term can indicate whether an area has returned to its pre-crisis trend, such that not only has that area recovered employment losses but it has added enough jobs that its employment levels are what would be expected had there not been any output losses. To calculate whether an area indeed was resilient and returned to a kind of equilibrium growth (or whether such a return is even possible!) is beyond my remit for the moment. However, the answer to whether an area is ‘resilient’ in this sense or not has much to do with whether there has been structural change in the economy (labor-saving technology, increases in productivity, further transition out of industry towards services). I highly recommend JK Galbraith’s new book The End of Normal for anyone interested in exploring this question.

As a point of reference, the ratio of 2013 employment to 2007 employment for the nation as a whole was exactly 1. The map (for the lower 48 states; Alaska and Hawaii are part of the Pacific division) shows that the West South Central states (Texas, Oklahoma, Arkansas, and Louisiana—major oil and gas states) performed best, with employment bases roughly six pct larger than they were in 2007. These can be deemed resilient. The West North Central (Great Plains states, where there is also quite a bit of fracking activity) and the Middle Atlantic (Pennsylvania, New Jersey, and New York) were second-best performing. The worst were the Mountain states and East South Central (Kentucky, Tennessee, Mississippi and Alabama), whose employment bases remained three pct below their levels at the end of 2007. These areas are clearly not resilient.

The next map depicts the above ratio of employment levels for all 50 states (maps not to scale). Clearly the Census divisions obscure some important differences within divisions, that is, between states. Census divisions do not always capture coherent economic-units, such as metropolitan areas or industrial districts, particularly in areas along and east of the Mississippi. A more apt unit of analysis for that would be the metropolitan statistical area, however these in turn do not necessarily have a single, coherent governing entity. As such, the US state, with different taxation regimes, varying receipts of federal moneys, bank regulation, local investment and labor force policies, etc, remains an important political-economic unit. The major takeaway, as I see it, is that resilient areas are either oil/gas producing (Alaska, North Dakota and Great Plains more generally, Gulf Coast) or are financial centers (New York and Massachusetts). Meanwhile, the diversified industrial and commercial economies of California, Washington, Virginia, Florida, Georgia, Pennsylvania, and New Jersey remain below their 2007 levels. That is just a hunch. From my academic research, other important factors include exposure to subprime mortgages and the foreclosure epidemic.


The next set of maps show change in employment for two-year increments beginning in 2007.I tried to apportion the states into quantiles, but there were quite a few shared values, and I also wanted to identify some of the outliers. The two-year increments correspond, more or less, with the most recent recession, then a nominal recovery period, and, perhaps finally, a stagnation period. These, incidentally, correspond to the stylized Minskiyan stages of the economic cycle (crisis/crash, recession, recovery, stagnation, economic boom, rinse, repeat). I won’t go into much depth here; I’ll leave readers to gander at these maps to their hearts’ content.




The maps, of course, rely on relative changes. I have also included below a table showing the ten states with greatest absolute employment losses from 2007 to 2009 and then the ten biggest gainers from 2009 to 2013.

Biggest Losers (from 2007 to 2009)
State Chg in employment (000s)
California -1,199
Florida -782
Illinois -409
Texas -403
Michigan -402
Ohio -401
North Carolina -329
Georgia -327
New York -288
Arizona -286
Biggest Gainers (from 2009 to 2013)
State Chg in employment (000s)
California 1,235
Texas 1,128
Florida 583
New York 546
Michigan 331
Ohio 291
Illinois 269
North Carolina 255
Georgia 250
Pennsylvania 230

Though the order in which these states appear varies somewhat, most of the states that lost the most employment also gained much of it back. The exception is Arizona, which lost over a quarter of a million employed workers but did not appear on the gains list. That state’s employment base grew by less than 170,000 from 2009 to 2013. In contrast, Pennsylvania lost close to a quarter of million jobs, but grew by 230, placing it at tenth in the gainer list.

In a previous post, I discussed the differences between employment and job growth. I emphasize here that I have looked at the stock of employed labor, not job growth. The quality of jobs is as important as the quantity, and job growth statistics provide great insights into employee turnover, job stability, and duration of employment. Additionally, I stress the importance of considering the sectoral component, which reveals comparative specialization and thus may indicate how an area is clued into larger financial networks and global supply chains. The utility of these maps is the clarity with which they can generate insights into the material distribution of burdens and benefits following the recession.

Employment update: the job creation meme II, the case of Texas

Note: this post was originally much longer, but WordPress failed me, and I do not wish to spend any more time on this.


In my last post, I discussed some preliminary reactions to one of the job creation memes that circulates in the popular press. In a nutshell, the contention is which groups or processes are responsible for generating jobs—businesses or consumers. Without developing a rigorous argument, I made three points. First, job creation statistics come from businesses reporting the number of employees they have on their payrolls. This point is important because not all workers are on a payroll (undocumented workers; sex workers; black market) and not all businesses accurately report payroll and employee data. So, we’re talking about a specific albeit large part of the economy—the formal, regulated parts, not necessarily cash-only businesses or grey/black market activities.

Second, businesses do not want to create jobs, for at least three reasons: hiring workers (tendering applications, interviewing applicants, training new personnel, etc.) is expensive (firing workers can also be expensive); it is not guaranteed to address its workload stresses; and, it may actually be based on faulty information about business prospects.

Third, ‘job creation’ is a sociological phenomena; we cannot observe ‘job creation’, but really only analyze statistics as they are reported to government agencies. These statistics are only part of the picture. There are qualitative aspects that reveal how job creation happens, such as which kinds of workers are hired (which may be structured by nepotism, cronyism, other forms of corruption, as well educational and skill levels; unionization may also be an important factor), or how businesses recruit and interview applicants (how much businesses spend on recruitment, how much workers spend to make themselves more attractive, through supplementary training, clothing, resume assistance, etc). The process of job creation itself creates costs and allocates those costs unevenly between workers and businesses depending on the characteristics of the company and industry. You could say that ‘job creation’ itself creates jobs, to the extent that human resource departments within companies and specialized recruitment agencies exist to facilitate the process.

Finer points

If you are going to report on job creation statistics, the least one can do is push further into the statistics and describe how the process is happening at lower levels of observation. On this note, there are several questions I would suggest are very important indicators of the nature and quality of job creation.

  • What kind of companies are creating jobs? Are they large/small; young/old; what is their ownership structure (publicly-listed, privately-held, S-corporation, partnership)?
  • What sectors of the economy are creating jobs (agriculture, business services, mining, manufacturing, retail, public sector, etc)? Under what occupational divisions can these jobs be classified (managerial, financial, engineering, legal, sales, administrative, healthcare, etc)?
  • How long do the jobs last? Are they seasonal? What is the rate of the turnover in the industry? If there is high turnover, is it the case that previously-fired employees are returning to old positions (like in manufacturing firms) or is there a regular churn of new employees (like in seasonal retail activities)? What are the job creation rates in high productivity, high value-added industries (such as business and financial services) as opposed to low productivity, low value-added industries? What role does job creation play in productivity and the cost structure of that industry?
  • How do job creation rates differ between similar scales of analysis, such as between major metropolitan areas or states? Within metropolitan areas or states, is there a difference in job creation between component areas (Manhattan versus Brooklyn; San Francisco versus Oakland; Los Angeles versus Orange; Dallas versus Fort Worth)?
  • How does the rate of job creation accord with demographic changes, such as the exit of retiring workers, the entry of new workers, in-migration, and the participation of women (who are more likely to leave the workforce for child-rearing, and possibly return for a variety of reasons)?

I think it is almost pointless to look at national-level job creation statistics for a country like the United States. The more interesting narratives come from specific case studies of industries at certain scales (metropolitan area, region, state). We know that the space-economy is characterized by clustering; the economy can be thought of as an amalgamation of production complexes linked together by networks of supply chains and business services firms. If you want to understand the prospects of a given area, then you must know how that area is inserted into the global economy, by way of its productive enterprises, business services firms, etc. I’m going to sketch out a map of job creation for the case of Texas, specifically its oil and gas sector, using job creation statistics from the Census to highlight the limits of that data. I also want to push past the tired cliche of “job creation” as driven by oligarchs or consumers by highlighting the larger geography in which all of these processes happen to look at the ramifications for workers and cities.

The case of oil and gas in Texas since 2009

I’m going to be as expeditious as possible and so I won’t be putting up comparisons of the oil and gas sector in Texas with other sectors. Suffice to state a few facts. First, total private-sector employment in Texas in the third quarter of 2013 was 9,292,884, of which 298,535 was in Mining, Quarrying, and Oil and Gas Extraction (NAICS 21). The sector is one of the smaller in the state, making up only 3.2 pct of private employment. (For the record, public employment in Texas is roughly 1.5 million). Within NAICS 21, there are three main sub-sectors: Oil and Gas Extraction; Mining (except Oil and Gas); and, Support Activities for Mining. We are mainly concerned with the former, which I focus on from now on.

Second, the Oil and Gas sub-sector employed 107,664 in 2013Q3, representing about 36 pct of the sector. In turn, in that year, companies established 11 years ago (the oldest firms) or longer accounted for 82.5 pct of employment in the sub-sector, however that figure has been declining since 2009Q3, when it was over 86 pct. The largest companies (more than 5oo employees) in this sub-sector employed 70 pct of all workers as of 2013Q3, which is almost two percentage points lower than their share in 2009Q3. That is, both younger and smaller companies must be growing at a faster rate than larger and older companies. It is noteworthy, however, that the oldest companies employ more workers in the sector than do the largest companies; in order words many older companies remain relatively small. Two pie charts below summarize the distribution of employees according to firm age and firm size in Texas as of 2013Q3.

image (1) image (2)

The main take-way from the difference in composition of the sub-sector in terms of age and size of its firms is that this is not an entrepreneurial sector. Start-ups, which are typically younger firms, do not constitute a large portion of employment. This should come as no surprise for this highly capital-intensive industry. Start-ups (young firms) are generally supposed to be the great job-creators, on average, in the economy as a whole. Just over 10 pct of employment is at very small firms (those with fewer than 20 employees), which is less than the average in the Texan economy. As a point of reference, for all firms in Texas, almost 15 pct of employment is contained within this size tier of firms. Similarly, almost 50 pct of employment is contained at very large firms across all firms in Texas. Within Texas at least, the oil and gas sector stands out for the greater than average proportion of employment that is organized within the very largest companies.

Let’s look at the distribution of oil and gas employment in Texas at the county level. Below is a choropleth map of sub-sector employment as of the third quarter of 2013.


There are perhaps five main production areas for oil and gas in Texas. The largest would be metropolitan Houston, in the south east, on the Gulf Coast. Harris County, where downtown Houston is, appears to contain almost half of the sub-sector. This tells us that quite a bit of the sector is in office work, because I doubt there is much drilling or refining in downtown Houston. The rest of the sub-sector is mainly located in and around Dallas (Exxon, for example, has its corporate headquarters in a Dallas suburb, though most of the managerial, planning, and oversight work of the company happens in Houston) as well as way up in North Texas (Amarillo), in West Texas (Odessa), and finally in South Texas (Corpus Christi, San Antonio). Most of all that is likely hydraulic fracking operations.

Astute readers will notice I still haven’t actually talked about job creation in oil and gas in Texas yet. One of my larger points is to emphasize the context of job creation. To do that, we should be aware of the basic organizational structure and geography of the industry under examination. However, as is clear from the above discussion, there are multiple indicators that could legitimately be used to describe job creation. I use ‘firm job gains,’ which counts the gains in employment (between beginning and end of a quarter) at firms that grew over the quarter, and ‘firm job lesses,’ which counts the losses in employment at firms that shrank over the quarter. So the former indicator does not include instances where firms fired employees or employees retired if that firm was growing. That is, employment losses at expanding companies are not counted by this indicator. The opposite situation holds for ‘firm job gains.’ The point of these indicators is to isolate job creation (that is, the formation of new permanent positions in firms, or the liquidation of previously permanent positions) as opposed to employee turnover or replacement hiring. These indicators are explained in brief at the Quarterly Workforce Indicator download site operated by the US Census Bureau (

The graph below depicts the quarterly net change in jobs in the sub-sector for Texas as a whole from 2009 to 2013. This is job creation less destruction. Bear in mind that these statistics represent flows, not a stock like employment.

image (4)

There are some key points. First, job creation is mostly greatest during the second quarter of the year. This makes sense as layoffs typically happen around Christmas and the New Year, and hiring begins in earnest in January. Second, the volume of job creation increased by roughly 80 pct between 2010 and 2011 (second quarters of each year), and about 20 pct between 2011 and 2012 (second quarters). It declined by over 10 pct from 2012 to 2013. It appears that job creation in the sub-sector has slowed, but remains quite high. The main point here is that the sector really did not begin to deliver gains to employment until after 2010.

The map above has suggested that we should look at job creation in the industry as it happens in specific areas. Houston appears to be the center of the industry, so we should start there. Below, I’ve compiled the job creation and job destruction statistics for oil and gas extraction in Houston (metropolitan statistical area) as a share of Texas job creation and destruction in that sub-sector.

image (3)

An important pattern that could be observed with job creation figures is whether or not convergence/divergence is taking place between areas. That is, it may be the case that rising diseconomies of scale at firms co-located in an agglomeration (such as oil and gas companies in Houston) or perhaps more general urbanization diseconomies (Houston itself) are causing firms to downsize operations or even relocate to other areas (say, to refinery areas in Beaumont or Port Arthur or to commercial areas in Dallas). The above graphic suggests that, for the first three years of the economic recovery beginning in the autumn of 2009, the oil and gas sub-sector in Houston captured a greater share of job destruction than job creation. This is not to say that there was net job destruction in Houston; in only two quarters did destruction of jobs surpass the creation of jobs within oil and gas. Rather, the point is that there was some convergence happening as areas outside Houston bore relatively less of the job destruction. In other words, roughly half the consolidation in oil and gas employment in Texas happened in Houston. This is notable because less than half the job creation happened in Houston during that period, and as roughly half the sector is located Houston, there is some discrepancy here.

More importantly, this process didn’t hold up for very long. If during the first three years that this sector was creating jobs (and I am sure we could assume was mainly related to fracking) most of this activity was happening outside of the industry’s central area, then once 2012 rolled around, job creation shifted back towards Houston.

Driving the process of convergence/divergence may be the evolution and maturation of the industry. The oil and gas sector can be divided into two main types of operations: upstream and downstream. The former involves discovery and extraction, and the latter involves transportation and marketing. It is intuitive that in the early stages of an economic boom in the oil and gas sector (fracking), upstream activities would expand first. As those projects are completed and go online, the emphasis shifts towards managing the projects, which ostensibly requires less although more specialized labor, and which can be accomplished off-site. As such, upstream activities probably require co-location with accountants, project managers, budget analysts, etc. There is a clear spatial division implied in the industry and in a boom.

Bottom line

A plausible explanation, then, of why job creation in the sub-sector shifted is the basic maturation of the industry. Placing job creation in its spatial context, then, has raised some critical points about distributional fairness but also, I suggest, about the durability of job creation. Here are some concluding points.

First, ‘job creation’ is much more than a conversation about the source of job creation, whether demand from consumers (demand from business, incidentally, is an ignored factor, even though such demand drives a lot of activity in trade and professional services) or from the benevolence of oligarchs. I think that conversations about job creation are impoverished and boring as they are currently. The more interesting aspects, I argue, are the industrial organizational and geographical aspects. From this angle, we can approach the quality and location of job creation, which implies distributional; matters.

Second, the data we have available on job creation do not allow us to answer those conversations about causes of job creation. They do, however, allow us to examine questions of distributional fairness. For example, the above statistics have pointed out that Houston is a crucial area for the reproduction of this industry. Yet, there is clearly a change in how Houston benefits, and I interpret the statistics as indicating that Houston has more recently distributed job destruction away. This is part of a larger story about the durability of the fracking boom and how this boom distributes benefits and costs. The data raise questions about job creation that are not part of the conversation, but which can be used to trace its potential trajectories in the future.

Third, I am not optimistic about its future trajectories, at least from the perspective of non-core areas. There is obviously a spatial component to the specialization of the industry, and as the industry matures, on-site operations will be streamlined. There is more here than simply demand for oil and gas. This is truly a matter of project management, budgeting, and contracts between oil majors and specialized providers. Fracking projects have a life span on average of about seven years, and are most profitable typically in the first two or three years. As the demand for labor on-site wanes and as companies begin to scale down or possibly enter bankruptcy, we might need to expect some financial turmoil in these peripheral areas. There could be consolidation. There could be stresses in the local financial networks, such as banks that finance commercial and industrial loans. There may be pressure on local spending and municipal tax revenues. Job creation may actually be a leading indicator of economic prospects.

A next step might be to perform a cluster analysis that identifies and isolates areas of fracking production (county or metropolitan scale) and then determines the extent to which job creation was shared between them. The timing of such changes would also be an interesting point. The cases of oil and gas specifically and Texas more generally deserve greater attention because Texas has captured so much of the employment growth since the nominal end of the recession and the fracking boom has been critical to its success. It’s development will have an impact on the US as a whole and certainly Gulf Coast area. We need to pay attention to this, examine the context of job creation, identify its locational attributes, and then we can speculate on the prospects of job creation. In this context, the source of job creation is only one aspect among many. The real questions cannot necessarily be answered by the data we have available, yet they do raise a number of other illuminating questions.

Nonetheless, left/progressive commentators should really cut it out with their thoughtless statements about job creation. Learn about which sectors are creating jobs, where they are creating them, and consider the context of those industries (namely production processes relations with other sectors, etc), and then maybe the conversation will take off.

Employment update: the job creation meme

On the radio yesterday morning, some DJ quoted Hillary Clinton, who recently said that ‘business don’t create jobs’ or something along those lines. The statement is meant, I believe, to reflect the notion that consumers create jobs, through their demand for goods and services. Implicitly, the statement seeks to counter the argument that the governing plutocrats are benevolent and generous ‘job creators.’ I think it was a mistake for left/progressive commentators to adopt this theme, but it is here, so let’s first take a look at the process of job creation and then employment and job statistics. My question is: what do we know about job creation, conceptually and descriptively? The post here is going to tackle the first of these, and I’ll write-up another one with some data later.


First of all, businesses change the number of jobs recorded in the formal economy by adding workers to or removing them from their payrolls. We know that businesses ‘create jobs’ because the government counts jobs as the number of workers who are employed by firms. So, purely from a methodological perspective, when we talk about jobs, we are talking about the gains and losses of salaried/employed workers at US firms (and occasionally at government agencies) as reported by payroll data (which in turn are reported to the almighty IRS).

Second, businesses do not want to create jobs. Employees are costly, and employers are really only going to hire a new employee after other efforts to cope with increasing workloads (overtime, or productivity-enhancing strategies, for instance) have failed to relieve the stresses from the rising demand for their goods and services. Clearly, there is an important relationship between demand for a firm’s output and the way that the firm adjusts to this changing demand. It is not necessarily the case that a firm will hire more workers as demand rises. Some owners are incompetent, for example. Some firms enter into long-term contracts that so structure their budgets that they simply cannot hire new workers, while other contracts make firing difficult. There is also a tremendous amount of uncertainty in the economy. The process of hiring and firing may be responding to events that are not fully formed or are ambiguous. So when a firm decides to hire/fire an employee, the process will usually be undertaken in the face of exigent circumstances. Again, business don’t want to do this. It’s expensive, it’s uncertain whether it will solve its workload problems, and it may be based on faulty information about its business prospects.

Third, the hiring/firing process is one of the miracles of the economy. I use ‘miracles’ advisedly because it is, in fact, a process that is extraordinary, a welcome but surprising event, and difficult to be observed or explained. Job creation statistics do not actually observe the hiring and firing process. Remember that those statistics simply count the reported number of employed workers at firms at a given time (beginning or end of the quarter, usually). If you want to actually see the phenomena of job creation, you will need to be either a manager or a (prospective/former) employee. The Census doesn’t see job creation; I can’t see ‘job creation’ happening directly. This is partly semantics, but also it should emphasize the fact that the process is personal and sociological. That is, hiring/firing really comes down to a host of economic, political, and cultural factors that are probably highly specific to context (national, regional, sectoral), with a dose of randomness as well. My point here is, if you really want to understand job creation, then you should absolutely be talking to prospective employees, recently hired employees, recently fired employees, and the people who hire and fire them. Then you can get a sense of how the process works.

These are some preliminary comments. In a post later I will elaborate on how job creation happens between sectors and how it varies depending on the quantitative measures utilized.

Employment update

I have created two choropleth maps of the state of employment in US metropolitan statistical areas (MSAs). The first map displays the unemployment rate in about 380 US MSAs. The second displays the percent change in employment (not unemployment) between June of 2007 (before the recession began) and June of 2014 (latest available data). To make the maps a little clearer, I’ve included state and coastal boundaries.

I used data from the BLS (, a shapefile from the US Census (, fussed with the data in various spreadsheet programs, and then plugged these variously into QGIS, TileMill, and MapBox Studio. The maps here are a final product of MapBox.

I’d like to able to create and share maps here without always going into the methodology. I add two notes about methodology here that are rather important before discussing the substance of the maps.

Map-maker, map-maker, make me a map

I’ll be the first to admit that my maps are an example of terrible cartography. Part of this comes from my technical inaptitude, but part comes from the decision to look at MSAs. In theory, metropolitan areas are a useful unit to study the structure of economy, but, practically, few people are familiar with their size, boundaries, relative location, and even constituent units. This lack of familiarity with the MSA is compounded by the fact that the unit is not defined in much of a standardized way. For example, Los Angeles MSA consists only of Los Angeles and Orange counties, its population was approx. 12 million in 2010, and its area is 4,850.3 sq. mi., according to Wikipedia. New York MSA, by contrast, consists of 25 counties in over three states, its population was close to 20 million in 2012, and its area is 13,318 sq. mi., also according to Wikipedia. So, we’re dealing with fairly arbitrary statistical creations.

While it may be difficult for most people to really read these maps (that is, to determine what the exact figures are for each and every MSA on the map), there are other patterns that can be picked up on more easily. One example is clustering of similar levels of unemployment and employment change at various scales: within-states or between them, or across larger regions. In my view, the map is not necessarily the final output; it is descriptive, not explanatory. These maps are meant to provide a basic snapshot and starting point for additional, more specific analysis.

And the lights all went out in Massachusetts

You might notice that no MSA located in the New England states is displayed. This absence comes from at least three quirks in US government statistics, which are worth mentioning for the important limits they impose on the maps. First, there aren’t actually any “MSAs” in New England; they are called “New England City and Town Areas (NECTAs).” This immediately introduces some confusion and if you dig around the Census, Bureau of Labor Statistics (BLS), and other US government sources, you’ll notice that New England states, and Massachusetts especially, consistently make odd appearances in federal statistics (often times, figures for these areas are not reported at all, or have a significant delay in their release). I’m not sure what’s going on here. As a result, we lose valuable information especially on Boston, one of the largest US cities, and Connecticut, which contains quite a bit of the US financial and insurance industry.

Second, the shapefiles I used to create the maps use “core based statistical area (CBSA),” which differ somewhat from MSAs. The two units are quite similar. CBSAs are typically amalgamations of “micropolitan” and “metropolitan” areas. The distinction is that metropolitan areas possess a population greater than 50,000. There are over 500 micropolitan areas and around 350 metropolitan areas. The employment data (retrieved from the BLS via the Department of Labor), it seems, covers mainly metropolitan and not micropolitan areas. Although, one of the key problems maybe that the BLS data is in fact organized according to MSAs, whereas the shapefile is organized by CBSA.

Again, the trade-off in using urban-level economies is that the way the data are organized and collected is something of a mess. I think, however, that looking at states does not give the same picture of activity, and nor do counties. And, more importantly, I’m not attempting to be too scientifically rigorous right now.

Final caveat: the maps display only the lower 48 states.

The substance

Let’s begin with the distribution of unemployment as of June 2014.  The worst-performing areas are located in Oregon, California, Arizona, around the Great Lakes (particularly around in Illinois and Michigan), and the southern states (particularly Alabama, Georgia, and Florida). There are several clusters of contiguous metro areas where unemployment is concentrated, including California’s Central Valley, the US-Mexico border region in the southwest, the greater Chicago, Detroit, and Atlanta areas, and also the southern tip of New Jersey (Atlantic City). The best-performing areas include Salt Lake City, the large cities of Texas (San Antonio, Dallas, Houston), and metro areas in Oklahoma, Louisiana, Iowa, Minnesota, and South Carolina.

A few features stand out. First, city size does not imply better or worse unemployment prospects(Chicago and New York versus Houston and Washington, DC, for instance). Perhaps this has to do with the specialized industrial and commercial base of individual areas.

Second, though statistical analysis could analyze this more precisely, just eyeballing the map suggests that the level of variation within states is less than the level of variation between states. That is, there is probably an independent effect of US states, possibly related to differences in state/municipal public spending/austerity, state tax regimes, or to the disproportionate allocation of federal aid to the states. Doing an econometric analysis of metropolitan performance is made trickier when using states as an independent effect, because so many MSAs located on and east of the Mississippi sit in more than one state.

Third, it is apparent that there are regional patterns; that is, some patterns appear consistent across many states. There seem to be three main groups. First, the west coast states as well as Nevada and Arizona; second, the Great Lakes area; and third, the southeast. Possible explanations for the first and third are the high concentration of distress from the collapse in real estate and banking (in the west coast, thrift) markets. Indeed, some of the largest bank failures of the 2008-2010 period were west coast-based savings institutions (IndyMac, Wachovia). Economic structure may also be a factor, such as the high level of specialized industrial activity as a share of total activity in the Great Lakes. However, I was under the impression that California, Georgia, and Florida are actually quite diversified economies (agriculture, industry, FIRE, professional services). Diversity ostensibly delivers greater resilience to economic downturns. So, this issue needs to be fleshed our more.


The next map displays a very different set of dynamics. Where unemployment provides an indication into the mismatch between the civilian population able and willing to work and the supply of jobs, employment growth is actually quite different. At the macro-level, employment growth reflects demographic change, business sentiment, consumer preferences, the savings rate, and there is also an important sectoral component.

This difference in the underlying dynamics explains why unemployment can remain very high, such as in the California Central Valley area, while employment growth is actually expanding. We require more regional- and sectoral-specific data to test whether this discrepancy arises from a skills mismatch (for example, a hypothesis could be that professional or information sectors are expanding while the construction or agriculture industries continue to contract, leaving the comparatively under-skilled workers in the latter unprepared for work in the former) or perhaps because of austerity measures that rationalized the public sector, or perhaps some other hypothesis.

At the very least, we have a reinforced sense of the regional patterns of growth and contraction. Texas, Louisiana, and, to an extent, Oklahoma are growing–this region is one of the main sites of the mineral extraction-natural gas-shale fracking boom of the last few years. The former industrial heartlands continue their long-term process of collapse. The greater New York area registers a contraction, although it seems the worst of it was pushed to the more peripheral areas in New Jersey, Pennsylvania, and the New York suburbs as opposed to the core area. The Washington, DC area–dare we go so far as to include Richmond, VA here?–is booming, so at least austerity has been good to some people.


Bottom line

There are two chief points I’d like to make, unrelated to the specific content of the maps. First, I think one of the next steps is a set of exploratory econometric analyses that can test the effect of states, proximity between metropolitan areas, industrial structure, and metro-level real estate and housing market distress during the Great Crash of 2007-2009. Cluster-based analytical techniques would also be a useful way of identifying patterns, although neither econometric analysis nor cluster techniques reveal much in the way of explanation.

Second, to get at the causes of differential economic performance would require a series of case studies. An appropriate scale for such case studies, I suggest, is “regional”: not necessarily an entire state, but not necessarily only the areas within a state. Again, cluster-based techniques might be a good first step in deciding which areas are similar or distinct enough to merit study as a group. Interestingly, the Federal Reserve regional banks are a fairly reliable source of this kind of study–each Fed region is composed of three or so states, and every now and then their economists produce a report of regional conditions.