Pre-Xmas Reads (Dec. 21, 2014)

I have arrived on vacation and may finally get around to writing some posts (assuming internet access stabilizes). To start off, here are some things I’ve read on the plane and intend to read over the break.

Year in Review. Grantland

A brief history of pregnancy workplace rights. JSTOR Daily

Governing through unhappiness. Potlatch [on austerity]

The War Nerd: more proof the US defense industry has nothing to do with defending America. Pando [there are also a couple of articles in the January edition of Harper’s that have terrific insights on the defense industry’s posture towards Russia and Ukraine]

Also: Could the US even launch a nuclear missile if it wanted to? Daily Mail. [Answer: probably not; serious longstanding morale problems in the USAF nuclear command]

Reporters fail to capture implications of pension provisions. Columbia Journalism Review [on the recent budget deal in Congress]

Adair Turner understands better than Paul Krugman. Angry Bear [understands “the economic situation”; follow video link]

Overselling America’s infrastructure crisis. New Geography

My reading of the FT on China’s “turning away from the dollar”. Michael Pettis at Credit Writedowns

Foucault’s responsibility. Jacobin

$100,000 says my portfolio will beat Tony Robbins’. Barry Ritholtz, The Big Picture

Weekend long reads (Dec. 5, 2014)

Readers will notice that there hasn’t been much activity here since Thanksgiving. My absence is partly due to traveling I’ve had to do, being engrossed in my new book (The Power Broker by Robert Caro), and other academic obligations, which will continue next week. Nonetheless, I have provided some long reads here as they seem to one of the more popular types of posts. I hope to have the third part of the series on sectoral investment patterns up by the end of next week.

Fracking tantrums

Banking

Research and academics

Thanksgiving long reads (Nov. 27, 2014)

Happy Thanksgiving! Here are some articles I have lined up to read over the long weekend. Not all are recent; I’m trying to clear out the reading list.

1. The tech worker shortage doesn’t really exist. Bloomberg Businessweek

2. State unemployment map goes monochrome for October 2014. The Economic Populist. [Not actually monochrome, but close: no state observes an unemployment rate greater than 7.9% (US average is 5.9%), although underemployment is a separate problem. Also contains maps for employment-population ratio by state!]

3. Kuroda turns up the heat on Japan Inc.: turn profits into higher wages. WSJ

4. How the world’s most leveraged hedge fund got away with insider trading. Zerohedge

5. Oil at $75 means patches of Texas Shale turn unprofitable. Bloomberg [Good run-down of the economic-geography of fracking profitability]

6. Public relations and the obfuscation of management errors–Texas Health Resources dodges its Ebola questions. Health Care Renewal

7. Boomers, millennials and interest rates: a muni investor’s perspective. BlackRock blog

8. For middle-skill occupations, where have all the workers gone? Federal Reserve of Atlanta

9. Over at Project Syndicate: economic growth and the Information Age: Daily Focus. Washington Center for Equitable Growth

10. Jeff Henry, Verruckt, and the Men Who Built the Great American Water Park. Grantland.com [Schlitterbahn!]

On the book front, I’ve reading The Power Broker: Robert Moses and the Fall of New York by Robert Caro. I’ve been meaning to read it for a while and then found it at the bookstore, so here I go!

The contemporary context of investment in the United States, part 2: definitions

This post is second in a series on the contemporary state of investment in the US. The first is here. The purpose of this post is to provide some definitions and context. There are three questions: first, what processes does “investment” refer to; second, who invests; and third, what patterns should interest us? To define and identify what investment activity matters (for economic growth and development in the US for the next decade or so), I rely mainly on some of the writings of critical/heterodox economists Hyman Minsky and John Kenneth Galbraith.

What is investment?

There are two common usages of the term “investment”. The first is the buying and selling of shares of companies that are publicly listed on stock exchanges. This set of activities is not the sort of investment I’m talking about in this post. No doubt, the stock market can be a source of capital for companies, which can then be deployed for investment. More often than not, however, the stock market functions as a market for corporate discipline and control (mergers, acquisitions, takeovers). The investment I refer to here is the accumulation of fixed assets (such as capital goods [like machinery], inventory, property, physical structures) for the purposes of generating income.

Let me offer a short digression here. Many would say that the two definitions above reflect a financial definition and then an economic definition of investment. In casual, daily conversation, that might be acceptable. However, if you have read Hyman Minsky then you would be aware that both processes are actually interlinked and co-constitutive. That is, the accumulation of fixed assets happens very much according to what is happening in capital markets (of which stock markets are a sub-set). More specifically, Minsky argues that the financing (particularly when financed through debt) and pricing of capital goods occurs in capital markets. The point is that the capital goods and fixed assets that are used for investment are also financial assets. And because these financial assets are accompanied by ownership claims and are financed, this implies that there are cash flow obligations: the debts that were used to finance capital goods accumulation must be validated. Those obligations are met by splitting off part of the income produced by those assets, either as dividends, interest payments, or perhaps (in dire situations) from the sale of capital goods. To make a long story short, Minsky demonstrates that these arrangements can precipitate a depression in the event that financial asset prices collapse, which he argues is possible if capital goods accumulation is financed in too speculative a manner. Otherwise stated, the capitalist system itself sows the seeds of its own crises. I suggest you read this or this.

Where does this leave our definition of investment? There are at least six elements: it refers to a (1) process of accumulating (2) capital goods, which are simultaneously (3) financial assets that are (4) owned by capitalists/entrepreneurs and (5) might be financed through debt, for the purposes of (6) generating income.

In the next section, I’m briefly going to discuss why (5) and (6) do not always apply but rather depend on who is doing the investing. Nonetheless, I think that the definition here is workable and sufficiently distinguishes this set of activities from the “investing” (asset trading) that commences every day with the ringing of the bell at the New York Stock Exchange.

There are a couple of other points I’d like to add here concerning how investment might influence the larger political-economy. Recently a book was released that contained a number of previously (I believe) unpublished essays by Hyman Minsky (they may have been published in a few academic journals) on the topic of jobs, employment, and welfare. I recommend this one, too, for a general introduction to Minsky’s thought and its application to employment. In the introduction and throughout that book, the reader might notice that the US economy is characterized as observing a “private, high-investment strategy”. I think there are two very important points in that phrase. First, investment is private: it is subject to ownership by individuals, not the state. This point should interest anyone who conducts cross-national comparative research: you cannot compare the investment process in a country like the US, where investment happens by entrepreneurs and households, with a country like China, where capital expenditures are determined largely by committees in state-owned businesses appointed by the (Communist authoritarian) state. It may be a minor point, but I think it is worth bringing up.

The second point is much more important: “high-investment strategy.” What does that imply exactly? The idea is that the US economy in particular generates growth by stimulating investment. That is, more private ownership of capital goods. Such a strategy is executed, in the US at least, by various policies: a tax code that favors capital goods accumulation (incentives for depreciation, tax credits); a favorable business environment (low regulation of business, which decreases the cost of doing business generally); and government contracts that partly underwrite profits in select industries, typically those that require high capital goods consumption (armaments, construction, airlines). Thus “growth” in the US economy is primarily pursued by creating favorable conditions for income generation by businesses.

There are number of problems associated with such a strategy, and I’ll quote the four that are mentioned in that volume on employment by Minsky. These can be found in a summary around kindle location 358-382.

First, a tax code that is geared towards more investment will increase inequality between the owners of capital and labor. For what it is worth, inequality on its own does not necessarily spell economic disaster. For instance, if you have read Thomas Piketty’s book, you’ll know that countries can endure long periods (centuries) of inequality without encountering, say, collapse and ruin. Certainly, there are issues of justice and quality of life that arise from high inequality, but the evidence that inequality might lead to economic and financial crisis is wanting. Politically, I don’t worry about inequality because the President and most American political leaders are not Rockefellers, Morgans, Vanderbilts, Carnegies, Fords, or Hearsts. In other words, the money eventually runs out.

[An aside: one of my favorite economists (Deirdre McCloskey) has recently written a review of Piketty that I suggest everyone read. It’s there on the home page in pdf form.]

Second, the income that accrues to owners of capital can lead to opulent consumption by them and emulative consumption by the masses, leading to inflation. There is a natural experiment here in the period 1964 to 1974 in the US, as employment tightened, defense spending escalated with the War in Vietnam, and the economy enjoyed an investment boom (initiated, by the way, with tax cuts passed during the Kennedy and Johnson administrations).

Third, government spending on defense (contracts to specialized and sophisticated high-technology industries) creates demand for high-skilled, high-wage labor. In turn, this widens the inequality among workers. Again, we can observe the effects of growth in high-technology sectors on wages and skills across occupations by observing the period from 1994 onwards, when the revolution in computer chips and the internet began. This was not totally the result of government spending (in fact, US defense spending dropped off after 1990, leading in part to the recession of 1991), although much of the US advanced technology industry grew out of companies that received defense contracts. Nonetheless, this is a serious problem for worker quality of life and, I would argue, for growth prospects between regions and metropolitan areas.

The final problem Minsky describes of a high private investment strategy is that if the tax code and business environment privilege capital spending, then rising business confidence hence banker optimism will erode lending standards while also increasing the riskiness and speculative nature of investment. The result can be a financial crisis and recession. It is worth noting that the 2008 crisis was not the result of excessive optimism by corporate businesses. Rather, the 2008 crisis was the result of a housing boom and a highly leveraged household and and highly leveraged, speculative, and corrupt financial sector.

At this point, I’ve elaborated enough on what I mean by investment. Now I’m briefly going to outline some key differences between the sources of investment.

Who invests?

I mentioned earlier in my definition of investment that it might be financed through debt and that the purpose of investment is to generate income. I’d like to add some caveats to that with the help of John Kenneth Galbraith.

In a previous post containing links, I had one from the St. Louis Federal Reserve documenting that credit to non-corporate non-financial sectors has remained low following the crisis. That to me was a key indicator of who is able and willing to invest in the current economy. We have at least three major groups here: the corporate sector (AT&T, Wal-Mart, ExxonMobil, DuPont, Microsoft); the household sector (you and me); and the financial sector (banks and lenders big and small). Each of these sectors have very different sources of capital and ways of generating income. The corporate sector, which as you can tell is populated by very large companies that are often structured in what economists would call an oligopolistic market (where they can influence prices of inputs and outputs), gets most of its capital (for the purposes of future investment) from retained earnings. In other words, expansion programs, research and development, development of new products are financed from last year’s profits. These business do not typically take out loans from banks to finance their activities, although they certainly float bonds and stocks as part of their complex capital structures. The main point is that these companies seek to make the supply of capital (like they do for all other strategic costs) a “wholly internal decision” (Galbraith, 2007, 34). Chapters Three and Four in The New Industrial State are the relevant backgrounds for this interpretation.

As such, investment is not necessarily financed through debt. In addition, investment may not happen with the aim of producing income. Galbraith describes that in these large corporate enterprises, the managerial hierarchy is responsible for long-term planning; this includes when and whether to make capital expenditures. The ultimate goal of the corporation, then, is not income or profit generation, but rather the elimination of uncertainty. The corporation seeks stability. The process of investment, we can deduce, will fall in line with how corporate managers attempt to achieve that stability. An oligopolistic market structure allows companies to pursue goals other than the relentless pursuit of profit, contra the neoclassical economics dicta.

In contrast, when American entrepreneurs attempt to launch a business, the financing usually comes from residential mortgages and the purpose is the generation of income (they do not exercise oligopolistic power). A home is an individual’s greatest potential source of capital, save inheritances. Consider how easy it is (excuse me, how easy it used to be) for most people to get a mortgage or to refinance their home; this, too, surely was a reflection of the high private investment strategy (or at least should be treated in policy as such).

Obviously, between the individual entrepreneur and the oligopolistic corporation, there are a lot of businesses that do not have pricing power and that have access to various forms of financing that are not restricted to home mortgages or inheritances. These businesses, which could be termed ‘mid-sized’ and have several hundred employees (let’s say, fewer than 500), and which are typically in manufacturing, transportation, utilities, and related industrial sectors, do enter into debt contracts with banks and other lenders. However, I think that our focus should really fall to entrepreneurs and large corporations. For the former, the reason is that small businesses create a tremendous amount of activity, in terms of jobs and sales. Most small business also fail, so this is a tremendously inefficient set of activities. In the short-term, however, small businesses drive quite a bit of local economic activity while providing a lot of people (not just the small business owners themselves but also the people they hire) with an outlet for social and economic advancement.

In the case of large corporations, the interests of these entities set much of the industrial and social policy in the US, and they are responsible for the bulk of exports and income. Furthermore, municipal and state governments compete quite vigorously over corporations. Local governments offer tax incentives, provide physical infrastructure, train the workforce, etc., partly with the aim of attracting business activity, and therefore tax revenue.

What I’m getting at here is that there is a geography of investment, and two very important features of that geography will be households and large corporations. The financial system is also important in this geography, but these days it is less about the location and activities of banks and more about the location, organization, and prerogatives of special investment funds, like pension or hedge funds. With that, I’ll move on to the final question.

Measuring investment and identifying patterns

I won’t dwell on this point much because instead of telling you what I’m going to do, I may as well just do it. In the next posts in this series, I’m going to present the data I stumbled upon from the Bureau of Labor Statistics for output, savings, and investment. The data is organized by sector, that is, industrial sector following the NAICS codes and also by the divisions suggested above (household, financial, corporate, and others).

The most basic pattern to identify is change over time: which areas demonstrate growth and which demonstrate contraction? There are roughly seven years of annual data, which is not a large sample by any means. There will be some noise.

A subsidiary pattern is relative change. The 2008 crisis marks a point where we can evaluate how credit distress during and after the crisis was distributed between sectors and, consequently, what have been and will be the prospects for investment and therefore economic growth. In other words, we can determine to an extent what sectors are “holding back” growth. Recall that this endeavor was largely touched off by the New York Times article that asked that very question (see previous post). My goal is to further investigate that question.

Link share: geography of a decade of job growth and decline

Check out this very well done interactive graphic from the blog of Austin-based consulting firm TIP Strategies!

I’ve filed this under “Terrible Cartography” (I have no category for the opposite), but rest assured this is the work of professionals.

Reading list (Nov. 19, 2014)

Here is another batch of items I am currently reading. Some of these are from last week or before, and that’s because I add things to my Pocket reading list and then leave them there.

1. Why are dystopian films on the rise again, JSTOR Daily

2. Why is anyone surprised that Abenomics failed? Naked Capitalism

3. Why do financial types hate Fed intervention? Pragmatic Capitalism

(Another option is that financial types pretend to hate intervention to mask the benefits that accrue to them from intervention; perhaps a recognition of the diversity of financial types could go some way to resolving this, ie. pension funds, hedges funds, too-big-to-fail banks, community banks, etc, as well as recognition that they don’t always love or hate it).

4. Potential output and recessions: are we fooling ourselves? Board of Governors of the Federal Reserve System

5. Credit to noncorporate businesses remains tight, St Louis Federal Reserve

6. Retirement planning: millennials vs boomers, Research Affiliates

On the book front, I have finished reading Robert Caro’s Master of the Senate (third in his LBJ series), which I recommend (specifically the chapters starting after pg 200). I am now reading the Kindle sample of Daniel Galvin’s Presidential Party Building: Dwight D. Eisenhower to George W. Bush and deciding whether or not to buy. I’m leaning towards buy.

A brief comment on petro-states

I read this Op-Ed piece at Al-Jazeera the other day, entitled ‘The perils of petro-states: The case of Alberta’ (link: http://aje.me/1fjPONb). It goes on to compare the Canadian province of Alberta with Norway, contrasting the former’s success in managing its oil wealth with the failed promises of economic salvation in the latter.

This topic appeals to me for a number of reasons. I grew up in Norway and my father worked for an oil major my whole life, so arguments about Norway’s success have been quite personal for me, as I consider myself a fortunate beneficiary of it. Similarly, a number of my colleagues at graduate school were specialists in sovereign wealth funds and natural resources generally; one of my dissertation examiners, in fact, wrote a book on such funds. These conversations have been floating around my head for the last few years, and they are rather interesting.

I think the author makes a couple important points in the piece, and I would like to take it up in a little more depth. Suffice to say in the meantime I do not think the comparison between Alberta and Norway is at all justified. Norway is a sovereign nation (it isn’t even a part of the European Union), it controls its currency, and the geography is completely different (its oil reserves are in the North Sea). A more apt comparison would be between Alberta and Texas. Both are provinces, subject to a national currency, much of their oil wealth is pulled out of the ground. Both the US and Canada are members of NAFTA, so when the author of this piece tries to relate the losses in manufacturing jobs starting in the 1990s to the mismanagement of oil wealth, it doesn’t quite make sense. There needs to be a sharper attention to geography and economic history here.

In a future post, I’m going to riff on “petro-state” and try to formulate a regional alternative to it. I think, as a start, the definition of petro-state (“dependent on petroleum for 50 percent or more of export revenues, 25 percent or more of GDP, and 25 percent or more of government revenues”) is a little too dry. There is, as noted above, the scale issue: does this apply only to countries, or at sub-national scales? What sort of petroleum products are included in this, for instance (raw material exports or refined products)? Using employment, output, and tax revenue data (assuming I can locate it), I’m going to try to work backwards and see what kind of alternative definitions can emerge that take geography more seriously.

The contemporary context of investment in the United States, part 1: introduction

The Great Recession (2007-2009) changed the context for investment in the United States in several ways. First, it created imbalances in the economy, in terms of losses and gains between economic (businesses, households, government) as well as industrial (agriculture, manufacturing, services) sectors. These losses and gains can be measured in terms of lost output, including employment. These are imbalances to the extent that contractions in output were unevenly shared across sectors.

Second, the political environment changed as new constituencies and alliances were formed, while others were made more obvious. An example of such a long-standing alliance that became stronger was the relationship between the Federal Reserve and large, globally-competitive financial companies. This relationship was codified in the emergency recapitalization (the TARP) and in the Dodd-Frank law. New constituencies emerged or became more pronounced, for instance as unemployment and homelessness rose, and they reflected a regional character (for the reason that the financial crisis and recession were, in fact, regional crises). These new constituencies and alliances generated pressures for different kinds of policy intervention, with varying success.

And finally, the macroeconomic context changed, given changes or stickiness in the informal rules of investment (such as tax rates, interest rates, the supply of credit). Similarly, economic development through the application of new technologies, discovery and extraction of fossil fuels, and global capital flows also have shaped the macroeconomic context.

Over the next several posts, I’m going to outline the context of investment in the US immediately before and since the financial crash in 2008. I will describe the nature of investment since the crash, with a focus on the distribution of investment activity between sectors (economic and industrial) as well as the nature of investment (private fixed assets: structures, equipment, intellectual property). Finally, I’ll briefly describe the kinds of companies and regions that were poised to reap the benefits of this changing context, and contrast them against those entities that have borne the greatest burden.

The next posts will frame investment in the US with the following specific questions. First, what do I mean when I talk about investment in the United States? I will outline that question by referencing JK Galbraith’s new book (The End of Normal) as well as borrowing some insights from Hyman Minsky. Let me add that I do not mean to advance any kind of coherent theory; that is way beyond my remit at the moment. Rather, I find it useful to identify useful metrics and relationships in the data that can eventually be situated within a wider theory, or can be used to advance or refute other ones.

Second, what are the current obstacles at the geopolitical and national levels to growth in investment within the United States? Off the top of my head, I can think of several important “obstacles”: the process of domestic credit allocation (including interest rates, integrity of too-big-to-fail banks, property development); the cost of raw materials (especially oil and gas); and, military commitments and the general financing of national security. Readers of Galbraith’s book will notice these topics are quite prominent in his account, while I have spent most of my very short academic career focused on the first.

Third, what is the current progress or state of the economic recovery since 2009? There are some subsidiary descriptive questions that point to my thinking here. Which economic sectors have returned to pre-crisis trends in output growth (contribution to GDP) and which remain stalled? Which industrial sectors? How did investment in private fixed assets respond to the crisis and aftermath? What about for investment in structures, equipment, and intellectual property? (If readers want to see what I’m getting at with these questions, take a look at this article from the NYT back in April: http://nyti.ms/1zZEVct; I am essentially expanding this kind of inquiry, which has been, incidentally, the empirical base upon which most of research has been conducted).

My doctoral dissertation advisor liked to say that a solid way to organize an argument follows the formula: what; why; and, so what. The what component here is really, where is investment happening in the USA (both in sector and locational terms). The why seeks to explain the relevant processes that propel the investment we see or hinder the investment we hope for (particularly, why it should be the case that despite there being a real recovery in material terms, there should be such a slow expansion in quality employment opportunities). The so what for me really comes down to distributional fairness. In other words, something that has been on my mind the last few years is whether the parties responsible for the financial crisis and disappointing recovery were the same parties that amortized its costs, or whether there has been a systemic and successful effort to push those costs onto other parts of society. Additionally, I want to explore the durability/sustainability of the investment that is happening. At the end of the day, we all want to be a part of a successful collective endeavor–the US economy. Hopefully I’ll find much to be proud and excited about as I dig through the data. Alternatively, hopefully I can provide some insight into how to rectify the processes that point to the contrary.

Meso-level analysis using cluster methods: local economic structure among Texan counties, part II

This post continues from the previous one, presenting the empirical findings from the cluster analytical technique.

The four independent variables I settled on as a way to investigate the local economic structure of Texan counties were: urban size, unemployment (rate), poverty (rate), and the share of oil and gas employment in total employment. This selection captures generalized urban economies that arise from large size as well as diseconomies (unemployment and poverty), and a degree of specialization (in an industry of some importance to the state). Unemployment and poverty, by the way, are very different phenomena. Unemployment does not necessarily imply poverty, and vice-versa. Unemployment refers to civilians of adult age who could be in the labor force but are not (perhaps because they are not qualified for available work, for instance). In contrast, individuals in poverty may be gainfully employed, except that they have fewer opportunities for social advancement, are socially ostracized, and their incomes are so low and posses so few material possessions that they have difficulty feeding and clothing themselves without assistance.

Local economic structure, thus, reveals in broad terms the allocation of economies and diseconomies from urban size and specialization. Once emplaced on a map, we may be able to determine the spatial structure of the economy as well. The rest of the post presents the findings from the k-means cluster analysis.

Cartographic introduction

Before diving into the cluster results, however, it is important to provide a more basic introduction to the geography of Texas. The first map below records the population distribution by county (absolute numbers). Consider that with the exception of the far west, counties in Texas are all roughly the same size. This curious feature makes it easier for us infer population density and hence urbanization from the following map. There are a number of important points to draw out here.

  1. Houston is the largest urban area in Texas, with more four million people living in the core county of Harris; activity spills out from Houston into at least four adjacent counties
  2. Dallas-Fort Worth is the next largest, with two heavily populated counties surrounded by less populated areas, ostensibly suburbs.
  3. Central Texas features San Antonio and Austin as part of a rather long chain of urban areas pointing towards Dallas, with San Antonio at the bottom of the chain, and Austin closer to the middle.
  4. Outside of the more densely populated eastern portion is the south, which include Corpus Christi and Brownsville.
  5. In the far west is El Paso, which is almost as much a part of New Mexico as it is Texas.
  6. Several counties are scattered throughout the panhandle, where there is oil and gas extraction.

tx-pop

I have also included a map of the distribution of the Hispanic population in Texas. I do this in order to head off any attempt to associate the location of Hispanic people and unemployment or poverty. In almost half of the counties in Texas, over fifty pct of the population are of Hispanic descent.

hisp

The feature that stands out most in comparing the total population and Hispanic population distributions is that the latter does not cluster in major urban areas. Instead, the main factor is proximity to the border with Mexico. This pattern stands in contrast, for example, to the distribution of black Americans in urban areas in the North during the last mid-century. Urban areas do not appear to be sites of particular concentrations of a given demographic group, in other words.

Finally, I present the results of the cluster technique described in the previous post.

tx-c8a-beta

There is quite a bit of information that needs to be parsed here. The first point is to note the frequency of each cluster group and their basic identities. The table below shows the frequency and the mean for each cluster group of the four dependent variables (population, unemployment rate, poverty rate, and share of oil and gas in total employment).

Cluster group Freq. Population Unemp. rate Poverty rate Oil and gas employment share
1 1 4,205,743 5.40% 18.63% 2.85%
2 34 31,413 5.29% 26.49% 1.14%
3 8 171,535 11.16% 32.08% 0.04%
4 25 208,060 5.19% 10.81% 0.94%
5 4 1,773,114 5.10% 17.97% 0.37%
6 58 16,835 3.46% 13.03% 4.63%
7 35 56,846 7.14% 20.75% 0.15%
8 89 39,826 4.83% 18.59% 1.02%
Total 254 100,199 5.14% 18.33% 1.70%

Cluster group 1 is clearly central Houston and cluster group 5 is the central areas of Dallas-Fort. Worth, San Antonio, and Austin. The next smallest group is cluster group 3, with 8 non-urbanized counties located along the border with Mexico. Cluster group 4 is the fourth smallest group, and it seems, based on the map, that these are suburban areas. It seems rather interesting that suburban areas of at least six metropolitan areas can be identified on the basis of the four dependent variables. This indicates there is uniformity in economic structure that relates to an area’s position within the urban hierarchy. All other cluster groups have memberships greater than 34 counties, the largest being the final cluster group (8) with 89 counties.

A glance at the map emphasizes that clustering is a common phenomena for almost all groups. This is especially the case for: group 8 in the central part of the state; group 7 in the eastern and northern parts of the state (a long chain of counties registering as 7 stretch from the border with Louisiana up to Dallas); group 6 in the panhandle and northwest of Dallas; and groups 2 and 3 along the border with Mexico.

For discussion, I think the most pressing question would be what are the differences between the most populated groups (particularly 2, 6, 7, and 8). Referring back to the above table, we can categorize those clusters as such:

  • Group 2 are small, poor areas with some oil and gas activity.
  • Group 6 are even smaller areas with low unemployment, relatively low levels of poverty and a substantial amount of oil and gas activity. Ostensibly, these counties observe an economic specialization in oil and gas extraction, and, being underpopulated, probably draw in skilled migrants for temporary work.
  • Group 7 counties are on average slightly larger than group 2 and 6 counties, but have higher rates of unemployment and poverty than group 2, which might be attributed to the very low presence of oil and gas.
  • Group 8 counties are slightly larger than group 2 counties, with slightly lower levels of unemployment and poverty, but also less oil and gas presence.

Obviously, we quickly encounter the explanatory limits of the four dependent variables. Thankfully, I collected more data than ended up going into the determination of clusters, so it is possible to fill out the analysis a bit more. Specifically, we can present the number of bank offices, deposits per capita, and median income. These indicators can reveal the financial characteristics of the clusters, such as access to financial services and accumulated personal savings. We should not necessarily expect that estimated median income will mirror deposits per capita precisely, for the reason that spending and saving patterns will vary between areas due to cost of living differences in addition to varying access to financial intermediaries. For instance, wealthier areas may hold their savings in 401(k) or brokerage accounts as well as bank deposits, while poorer areas may hold savings in cash. Other areas probably send savings back to Mexico in the form of remittances. So the financial indicators can raise some interesting hypotheses (again, cluster analysis does not manufacture evidence that allow us to make statements of causality; this is purely exploratory).

Cluster group Est. median income Bank offices Deposits per capita
1 51,298 1024 50,241
2 35,197 10 20,600
3 30,274 34 10,220
4 62,475 54 14,240
5 52,579 431 36,130
6 49,825 7 30,421
7 38,919 13 14,364
8 40,613 14 20,078
Total 43,815 27 21,209

I will take each cluster group in turn.

  • Group 1 is central Houston, which as a median income almost 20 pct greater than the Texan average as well as the most number of bank offices and the largest stock of deposits per person, almost equal to estimated median income. I should note anecdotally, by the way, that the major urban areas do not necessarily have greater deposits per capita than rural or other areas. This really is a matter of financial literacy and access, which in turn is structured very differently between regions and demographic groups.
  • Group 2, a relatively poor grouping, has the lowest average median income, the second fewest average number of bank offices, but its average deposits per capita is roughly in line with the Texas average.
  • Group 3, along the Mexico border, which also had the highest rate of poverty, has lowest average median income, and the lowest average deposits per capita. However, it has a high average number of bank offices. One might be tempted to say that the number of bank offices reflects remittance activity, except that most banks in the US do not offer this kind of activity, but rather face a number of non-regulated, pseudo-bank competitors (wire services). So, I’m not sure how to interpret that high number. It may be the case that these banks are unit (stand-alone) banks, unlike the nationally-recognized banking conglomerates, who would be unlikely to locate in such areas anyway, leaving a more competitive banking market.
  • Group 4, the suburbs, have the highest average median income yet one of the lowest levels of deposits per capita. Perhaps this reflects my statement earlier about wealthier households having access to a larger and more sophisticated range of financial products, which would mean they put fewer of their assets in bank accounts. Another alternative is that their wealth is accumulated in material things, such as residential property, fixed assets if they are business owners, and other ‘stuff’. Perhaps they finance their expenditures with debts and mortgages as well, creating financial obligations that draw their money away from savings and into regular interest payments.
  • Group 5, the urban cores, are also high income areas, with an exceptional number of bank offices on average.
  • Group 6, mainly in the panhandle, based on the low readings of poverty and unemployment, appear to be relatively successful rural communities. The median income in the average group 6 county is over ten pct greater than the Texas average, with a high level of savings (deposits per capita). Furthermore, this group contains the lowest number of banks offices, which suggests an uncompetitive but perhaps stable banking system.
  • Groups 7 and 8 are quite similar, featuring lower incomes, deposits per capita, and number of banks than the Texas average.

A final indicator I would like to include is total job creation. Here, I simply took the average quarterly number of jobs created (job gains less job destruction) between 2009Q3 and 2013Q3 for each county. As a point of reference, I also calculated job creation as a share of total employment, to account for urban size.

Cluster group Avg. quarterly job growth Job growth relative to urban size
1 17,484 0.90
2 78 1.15
3 273 0.70
4 556 1.15
5 6,750 0.85
6 87 1.65
7 122 0.74
8 97 0.87
Total 320 1.09

The main point from the job growth figures is that the greatest relative amount of job growth was happening in clusters 2, 4, and 6. Recall that these are, in fact, quite a disparate group: group 2 is one of the poorest in terms of work opportunities and high rates of poverty, while group 6 are major oil and gas centers. Group 4, meanwhile, the suburbs, have comparatively low oil and gas presence but are generally wealthier. It is likely the case that further disaggregation of job creation statistics would be necessary to determine the quality of these jobs.

That concludes my case study using cluster analysis. After completing posts like this, I often sit and wonder to myself, “Now, just exactly why did I do all of this?” The point, for me anyway, is that context is important. Given the major economic and political themes going on in the US right now–fracking, the uneven recovery, the ongoing foreclosure crisis, immigration, too-big-to-fail, the competitiveness of US industry–it is as important to know the location and context of such events in order to further our understanding of why these topics are important, to whom they are important (in the sense that some groups will benefit from certain kinds of economic processes while others will bear the burden of them), and the long-term ramifications. Eventually, I am going to revisit the posts I have written on Texas and the oil and gas industry and try and bring together some of my insights into how that industry is shaping the US economic landscape.

Meso-level analysis using cluster methods: local economic structure among Texan counties, part I

Quantitative analysis is never as neat and tidy as one would like it to be. At least, I’ve found that it is often hard to be graceful when you’re dealing with large amounts of data, with the need to clean it up, transform it, present it, and, finally, interpret it. And, then, of course, there is the issue of documenting your process and method, which can also be tedious. Data management and analysis are hard-to-master art forms.

Especially with geography, where you typically are looking at multiple aspects of a large sample of places, presenting the greatest amount of relevant information in the most convincing way to an audience makes for a messy task. Not only is it messy in the planning and development stages, but this carries on all the way to presenting it. (Life would be so much easier with a word processor like LaTeX, which gives much more control over presentation to the writer, except that so many people still use and edit documents collaboratively, which LaTeX doesn’t make easy).

My purpose here today is to share my workflow process, including its shortcomings and how I would like to improve it, as well as to continue the analysis of Texas, which I seem to be fixated upon at the moment. In this post, I outline my workflow process. In a subsequent post, I will present the data.

The method I describe here is cluster analysis. Cluster analysis is a technique that in theory condenses a tremendous amount of information for easier access, yet in practice is quite messy. Essentially, cluster analysis groups objects based on their similarity. It is mainly an exploratory technique, useful for generating hypotheses. There are many potential algorithms for sorting the observations. In a chapter of my dissertation, I used a hierarchical method, which began with all observations in their own cluster and then aggregated two clusters together at each stage. This process avoids the issue of selecting in advance the number of clusters, and employs ANOVA to determine the optimal distance between clusters with the aim of minimizing the sum of squares of a given pair of clusters. The method strives for maximum internal homogeneity in each cluster based on the dependent variables. Other post-estimation techniques are available to confirm the optimal number of clusters.

Here, I use k-means clustering, which has three steps. First, the number of clusters is specified. Second the location of each cluster is initialized. Third, each object is attributed to the nearest cluster. Convergence is achieved when the algorithm can no longer change the assignment of each observation for an optimal fit. It should be noted that, in this way, k-means clustering can group the same set of observations differently on each independent attempt. That is, I could run the same process today and discover an entirely new membership for each observation. Another main difference between k-means and hierarchical methods is that in the former the number of clusters are designated in advance. There are also differences in post-estimation tests.

Let me now move on to the more mundane aspects.

My workflow process

Here is the process by which I assembled data and performed a cluster analysis.

  • Data collection
    1. I gathered data from four datasets: employment (Quarterly Workforce Indicators, www.ledextract.ces.census.gov); unemployment (Bureau of Labor Statistics, http://www.bls.gov/lau/); poverty and income (Small Area Income & Poverty Estimates, US Census, http://www.census.gov/did/www/saipe/data/statecounty/); and, number of banks and deposits (from FDIC Summary of Deposits, https://www2.fdic.gov/sod/).
    2. The geography was Texan counties, of which there are 254 (big sample).
    3. I downloaded the relevant data for the most earliest available date, which differs somewhat between each dataset. Sometimes the data were not all available for all observations, so I retrieved the earliest dataset that had complete data.
      1. Employment data: end of 2012
      2. Population (total size, Hispanic): end of 2012
      3. Unemployment: June 2014
      4. Banking: June 2014
    4. I then merged each of the datasets using STATA and saved this as the master file.
  • Running the cluster analysis
    1. The first step is to standardize the variables, in order to place all variables on the same scale and to prevent distortions.
    2. There was a long trial-and-error process here, where I would select various combinations of variables and then perform post-estimation tests and create maps to determine what worked best. Bear in mind, I am trying to sort counties according to local economic structure, which explains why I initially included different sectors. To make a long story short, I ultimately prevailed upon four variables: total population; the unemployment rate; the poverty rate (all ages); and, the share of oil and gas employment in total employment (all variables were standardized). Often, the maps produced from each iteration would just not seem right, so I would return to the drawing board and repeat the process. Scrutinizing a map may not sound particularly scientific, but the bigger problem is that there is no hard rule about what constitutes an appropriate threshold for the post-estimation tests, so my subjective knowledge is as valid as a seemingly objective mathematical test.
    3. For each set of variables, I would always run the analysis with at least two and at most 13 cluster groups. I would then invoke a cluster stop rule, which produces an F-index. When comparing the F-index values for each cluster, the point is to locate the highest values. You would then gather the cluster groups with the greatest F-index values and look at the frequency with which observations appear in each group (for example, important questions here are: does one group have almost all the observations, or are there many groups with only one observation?). Then you could summarize the means of the dependent variables for each cluster and see how different they are.
    4. Then, if you really wanted to be rigorous, you could run ANOVA tests with the cluster group identifier as the independent variable regressed on each of the dependent variables (population, unemployment, poverty, oil/gas). A statistically-significant F-test in ANOVA can tell us whether or not the clusters are distinct from each other. However, again, there are no tools that have been recognized by scientific consensus as being the most accurate or reliable in determining the number of clusters. Some users of cluster analysis would pursue further statistical techniques, whereas my choice is to generate maps and utilize my knowledge of geography to gauge reliability.
    5. I will justify the selection of the four variables in the next post.
  • Creating tables and maps to present the information
    1. The final cluster analysis I performed identified eight cluster groups for the four dependent variables. I created several tables:
      1. One to show the mean values of the dependent variables, as well as number of counties, for each cluster group.
      2. Another to show the mean values on a set of variables not used in constructing the cluster schematic, partly as a further test of robustness but also as a means of analysis.
      3. And finally, a table showing the average quarterly change in number of jobs (not employment; this is the flow measure for ‘job creation’)  from 2009Q3 to 2013Q3. Again, including this data allows for a comparative analysis for hypothesis-testing.
    2. I exported the county identifiers as well as their cluster group identities into a .csv file and then converted that file into a .dbf file. This step is necessary for correctly adding the county cluster identities to a shapefile for Texas counties. I combined the .dbf file with the shapefile using QGIS and then moved the new shapefile over to TileMill, where I tinkered with the color scheme and added a legend (TileMill is less clunky than QGIS).

This is the process by which data was collected, tuned up, and applied for the purpose of cluster analysis. In the next section, I present the results and then advance some hypotheses and discuss some thoughts of mine on the workflow process and also the empirical findings.