(613) 818-2848

More reasons why network analysis is the next big thing: flexible methods from visual data analysis to hypothesis testing and models



In recent blog posts I have argued that network analysis is going to be the next big thing for evaluators,policy planners, and business people working in complex markets and economic/social settings (which now describes just about everything).

Entrepreneurs and managers with their eye to the bottom line, government policy makers and program evaluators with escalating imperatives of policy and cost-effectiveness will need to find ways to adapt. The ability to map and u#Cop21 2k tweets Harel-Koren 0915 Nov 30nderstand networks of economic and social players and relationships will be more and more critical.[1] Analysis of the costs, benefits and incentives facing the players in a network using network theoretic models can inform policy makers, evaluators or business planners, about the conditions under which the building or expansion of a network might be successful and useful for carrying information.[2]

Today’s question:

For many researchers who are familiar with standard statistical methods and modeling, one question may be something like “are there methods analogous to descriptive and inferential statistics in network analysis”? This is an attempt at a quick answer (in the affirmative) to that question.

Methods of network analysis

Visual: Most people who are familiar at all with network analysis (usually in the context of social network analysis) see it as a means to “map” a network of connections. Visual inspection of such mappings can tell us much about a network.

Metrics: A second “level” of understanding of network analysis generally involves the key network metrics, such as density, average path length and the centrality measures of individual actors. The ability to generate and interpret these network statistics allows a much richer understanding of the structure and nature of a network. As an example, centrality measures, among other things, allow us to identify and differentiate between those who have many connections, those who may act as “bridges” between many others, and those who are very influential because of being “connected to the highly connected”.

These measures are generally interpreted relative to their context. Much like more familiar descriptive statistics, they tell us which player has, for example, the most connections, without the need for a test. The answer to a question like: “Is this a dense or sparse network?” depends very much on the nature of the network’s players and context and can be best answered with reference to other similar networks.

Another “level” of analysis allows us to test, statistically, certain types of hypotheses about networks’ structures and the relationships underlying them.

Hypothesis testing and models: The availability of methods to formally test hypotheses, such as, “The structure of the network is systematically related to the attributes (age, gender, profession) of its members”, allows us to understand much more about such things as how sub-groups may be formed and an idea, therefore, may be spread. Identifying with some level of statistical significance classes of actors that tend to be highly interrelated in a network would probably shape any strategy for or understanding of how to communicate or disseminate information therein.

Because a network under study is, essentially, one observation, and this is usually the research setting for network analysis, we cannot rely on sampling from multiple observations and the statements of probability and statistical significance that follow. Hypothesis testing in network analysis is usually based instead on the idea of permutations, and the likelihood that a network as observed would appear if a great many networks were generated with the same underlying data.

A key distinction exists between “monadic” and “dyadic” analyses. Monadic relationships are node-to-node. This allows many interesting questions to be answered with standard statistical techniques once network statistics are calculated. For example, once we have calculated the degree centralities of the nodes, we could check the correlation of degree centrality and node attributes such as measured happiness, or age, etc.

The units of dyadic relationships are pairs of nodes. Here we can test hypotheses regarding whether or not the existence of a pattern of dyads is related to another pattern of dyads, i.e. if people have one type of relationship, do they tend to also have another type? For example, we might wonder if collaborative relationships depend on schools attended in common.

Mixed dyadic-monadic hypotheses are also possible, in which case we may ask whether or not a pattern of connections depends on a monadic variable. For example, we might ask whether or not gender affects the formation of friendships in a network. A particularly important application of dyadic-monadic analysis is in testing hypotheses regarding influence, diffusion and selection. An example of diffusion would be where peoples’ political beliefs are influenced by their pattern of friendships: we take on some of the beliefs of our friends. On the other hand, running the causation the other way, a hypothesis might be that political beliefs (monadic variable) are the basis of peoples’ selection of friends (dyadic variable).

Another important application, already mentioned, of mixed monadic-dyadic analysis and hypothesis testing is looking for “homophily”, i.e. the tendency of people to be linked with others with whom they are similar in some important respect. The existence of homophily can lead to relatively isolated subgroups in a network, thus influencing the diffusion of such things as information, ideas, or in the case of epidemiology, disease.

Finally, there are classes of models designed to detect and test for the significant existence of certain structural elements of networks, such as dyads, triangles and star-configurations, the pervasiveness of which can indicate important features like homophily and structural holes. The most well-known of these models is the exponential random graph model (ERGM).[3] As for the other hypothesis tests discussed above, the statistical significance of a given structure, or “network statistic”, such as dyads or triangles is determined from a large number of permutations run on the same data. Structures that are found to be significant in explaining the network can then be interpreted in terms relevant to the analysis at hand. For example, if triangles are a significant statistic, this may indicate significant “transitivity” (one’s friends are also friends). This would suggest a tendency to clustering and homophily. By specifying these relationships as hypotheses, we can therefore put them to the test.


The foregoing is of course only a brief discussion of these methods. I think its important to stress that the right kind of analysis to use in any case depends, as in all research methods, on the nature of the question and the standard of evidence required. I hope that this helps to further demonstrate the potential for establishing network analysis as a research tool of considerable rigor, for market research, policy and evaluation.






[1]Autumn 2015: http://haikuanalytics.com/why-network-analysis-is-the-next-big-thing/

[2]Autumn 2015: http://haikuanalytics.com/economic-analysis-of-networks-serious-potential-for-policy-and-evaluation/

[3] A set of alternative models based on the ERGM have also been developed and are making their way into practice.