Informally speaking, a graph is a set of objects called points, nodes, or vertices connected by links called lines or edges. In a proper graph, which is by default undirected, a line from point A to point B is considered to be the same thing as a line from point B to point A. In a digraph, short for directed graph, the two directions are counted as being distinct arcs or directed edges. Typically, a graph is depicted in diagrammatic form as a set of dots (for the points, vertices, or nodes), joined by curves (for the lines or edges). Graphs have applications in both mathematics and computer science, and form the basic object of study in graph theory.
Applications of graph theory are generally concerned with labeled graphs and various specializations of these. Many problems of practical interest can be represented by graphs. The link structure of a website could be represented by a directed graph: the vertices are the web pages available at the website and a directed edge from page A to page B exists if and only if A contains a link to B. A graph structure can be extended by assigning a weight to each edge of the graph. Graphs with weights, or weighted graphs, are used to represent structures in which pairwise connections have some numerical values. For example if a graph represents a road network, the weights could represent the length of each road. A digraph with weighted edges in the context of graph theory is called a network. Networks have many uses in the practical side of graph theory, network analysis (for example, to model and analyze traffic networks).
Simpson's paradox (also known as the Yule–Simpson effect) states that an observed association between two variables can reverse when considered at separate levels of a third variable (or, conversely, that the association can reverse when separate groups are combined). Shown here is an illustration of the paradox for quantitative data. In the graph the overall association between X and Y is negative (as X increases, Y tends to decrease when all of the data is considered, as indicated by the negative slope of the dashed line); but when the blue and red points are considered separately (two levels of a third variable, color), the association between X and Y appears to be positive in each subgroup (positive slopes on the blue and red lines — note that the effect in real-world data is rarely this extreme). Named after British statistician Edward H. Simpson, who first described the paradox in 1951 (in the context of qualitative data), similar effects had been mentioned by Karl Pearson (and coauthors) in 1899, and by Udny Yule in 1903. One famous real-life instance of Simpson's paradox occurred in the UC Berkeley gender-bias case of the 1970s, in which the university was sued for gender discrimination because it had a higher admission rate for male applicants to its graduate schools than for female applicants (and the effect was statistically significant). The effect was reversed, however, when the data was split by department: most departments showed a small but significant bias in favor of women. The explanation was that women tended to apply to competitive departments with low rates of admission even among qualified applicants, whereas men tended to apply to less-competitive departments with high rates of admission among qualified applicants. (Note that splitting by department was a more appropriate way of looking at the data since it is individual departments, not the university as a whole, that admit graduate students.)