In a Graph Analytics of Big Data, we try to model the given problem into a graph database and then perform analysis over that graph to get the required answers to our questions. There are several types of graph analytics used such as:
- Path Analysis
- Connectivity Analysis
- Community Analysis
- Centrality Analysis
Path Analysis is generally used to find out the shortest distance between any two nodes in a given graph.
Route optimization is the best example of Path Analysis. It can be used in applications such as supply chain, logistics, traffic optimization, etc. Connectivity Analysis is used to determine the weaknesses in a network. For Example - a Utility PowerGrid.
The connectivity across a network can also be determined using the Connectivity Analysis. Community Analysis is based on Density and Distance. It can be used to identify the different groups of people in a social network. Centrality Analysis enables us to determine the most 'Influential People' in a social-network.
Using this analysis, we can find out the web pages that are highly accessed. Various algorithms are making use of Graph Analytics. For example- PageRank, Eigen Centrality and Closeness, Betweenness Centrality, etc.
Graphs are made up of nodes/vertices and edges. When applied to real-life examples, 'people' can be considered as nodes. For example customers, employees, social groups, companies etc. There can be other examples also for nodes such as buildings, cities and towns, airports, bus depots, distribution points, houses, bank accounts, assets, devices, policies, products, grids, web pages, etc.
Edges can be the things that represent relationships. For example- social networking likes and dislikes emails, payment transactions, phone calls, etc. The Edges can be directed, non-directed or weighted. For example -John transferred money to Smith, Peter follows David on some social platform, etc. The examples of non-directed edges can be - Sam likes America etc. An example of weighted edges can be something like - 'the number of transactions between any two accounts is very high', the time required to reach any two stations or locations', etc. In a big data environment, we can do Graph Analytics using Apache Spark 'GraphX' by loading the given data into memory and then running the 'Graph Analysis' in parallel.
There is also an interface called 'Tinkerpop' that can be used to connect Spark with the other graph databases. By this process, you can extract the data out of any graph database and load it into memory for faster graph analysis. For analyzing the graphs, we can use some tools such as Neo4j, GraphFrames, etc. GraphFrames is massively scalable.
Graph analytics can be applied to detect fraud, financial crimes, identifying social media influencers, route optimization, network optimization, etc.