4 Data Management and Analytics in Business
Learning Objectives
- Explain the basics of data.
- Discuss data storage and retrieval.
- Understand database management systems.
- Describe data modeling and analysis.
- Demonstrate knowledge of data visualization and descriptive analytics.
Introduction
Data management and analysis refer to the process of collecting, organizing, storing, and analyzing data to derive meaningful insight to help support business decisions. In business information systems, data management and analysis are critical components that empower organizations to make informed decisions, monitor progress, and evaluate overall performance.
Data management involves several steps, including data collection, data cleansing, data integration, and data storage. The process of data analysis involves identifying relevant data, finding patterns, and deriving insights to help organizations make strategic decisions.
Business information systems provide powerful tools to help with data management and analysis. For example, data warehouses, data lakes, and databases make it possible to store large volumes of data in a central repository, making it easy to access and analyze. Data visualization tools, such as dashboards and reports, help to present complex data in a clear and concise manner, making it easier for business leaders to interpret.
Effective data management and analysis can have a significant impact on business performance. By analyzing customer data, businesses can identify trends and preferences, improve customer engagement, and increase revenue. By analyzing process data, they can identify bottlenecks and inefficiencies and make changes to improve operational efficiency.
So, data management and analysis are critical components of any business information system, enabling organizations to make data-driven decisions that support their strategic goals.
What is Data?

At the most fundamental level, data consists of raw facts, observations, or measurements that describe events, objects, or conditions. In information systems, data serves as the primary input that systems capture, store, process, and ultimately transform into meaningful outputs. Data may take many forms—numbers, text, images, timestamps, or sensor readings—and by itself it often lacks context or significance. Its value emerges when information systems organize, process, and analyze data in ways that support understanding, coordination, and decision-making.
One important way to think about data in information systems is by distinguishing between quantitative and qualitativedata. Quantitative data is numerical in nature and can be measured or counted, such as sales totals, inventory levels, response times, or exam scores. This type of data is well suited to statistical analysis, trend identification, and performance measurement, making it central to operational reporting and analytics. Qualitative data, by contrast, is descriptive and often captures meaning, opinions, or experiences. Examples include customer feedback comments, interview transcripts, emails, or social media posts. While qualitative data is less easily reduced to numbers, it plays a critical role in understanding user behavior, sentiment, and context—areas where purely numerical data may fall short.
Data can also be classified based on how it is organized and stored. Structured data follows a predefined format, typically arranged in rows and columns within databases or spreadsheets. Examples include customer records, financial transactions, and enrollment data. Because its structure is predictable, structured data is relatively easy for information systems to store, query, and analyze. Unstructured data, on the other hand, does not conform to a fixed schema. Text documents, images, videos, audio recordings, and many forms of web content fall into this category. Managing unstructured data presents additional challenges, but modern information systems increasingly rely on it, particularly in areas such as customer relationship management, knowledge management, and artificial intelligence.
Understanding these distinctions is essential for appreciating how information systems are designed and used. Different types of data require different storage technologies, processing methods, and analytical tools. By recognizing the varied nature of data, students can better understand how information systems convert raw inputs into insights that support organizational goals.
Unstructured, Structured, and Semi-Structured Data
Unstructured data does not have a predefined data model or format. It can include documents, images, audio files, videos, social media posts, and other types of data that are not organized in a structured way. Unstructured data is difficult to analyze using traditional data processing tools and requires specialized tools such as natural language processing and machine learning to extract insights.
Structured data is organized and formatted to a predefined schema or data model. It is usually stored in relational databases or spreadsheets and can be easily analyzed and queried using database management systems and query languages.
Semi-structured data has some structure but is not fully organized or formatted to a predefined schema. It can include data in formats such as XML, JSON, and CSV, which have some structure but may not conform to a strict schema. Semi-structured data can be processed using tools that can handle both structured and unstructured data, such as NoSQL databases and data lakes.
Data-Driven Performance
Data plays a central role in how organizations measure performance, identify problems, and guide improvement efforts. However, the value derived from data depends heavily on selecting analytical approaches that align with the type of data being collected. Quantitative data, which consists of numerical values, is well suited to statistical techniques such as regression analysis, forecasting, or time series analysis. These methods allow organizations to identify patterns, measure relationships, and evaluate performance trends over time. Qualitative data, by contrast, requires interpretive approaches such as content analysis or thematic analysis to uncover meaning, context, and underlying motivations that cannot be captured through numbers alone.
Across industries, data-driven performance management enables organizations to move beyond intuition and anecdote toward evidence-based decision-making. In healthcare, for example, data is used to track patient outcomes, monitor the effectiveness of treatments, and identify risk factors associated with specific populations. By analyzing clinical and demographic data, healthcare organizations can improve quality of care, reduce costs, and proactively address emerging health concerns. These insights support both operational improvements and long-term strategic planning.
In business settings, performance data is routinely used to evaluate financial results, sales effectiveness, marketing campaigns, and operational efficiency. Information systems collect and integrate data from across the organization, enabling managers to assess inventory levels, supply chain performance, customer behavior, and employee productivity. This data-driven visibility supports more informed decisions about resource allocation, process improvement, and product or service innovation.
Data also plays a critical role in the public sector, where it informs policy development and program evaluation. Governments and public agencies use data to track economic indicators, assess the effectiveness of social programs, and measure progress toward policy goals. By grounding decisions in data, policymakers can better understand the consequences of their actions and allocate public resources more responsibly.
Ultimately, data-driven performance reflects a broader shift in how organizations operate and compete. When supported by appropriate information systems, analytical tools, and human judgment, data becomes a powerful asset for improving efficiency, fostering innovation, and supporting sound decision-making across virtually every industry and sector.
Data, Information, Knowledge and Wisdom
The terms data, information, knowledge, and wisdom are closely related and often used interchangeably, but in the context of information systems they represent distinct levels of understanding and value. Together, they describe a progression that explains how raw inputs are transformed into insight and informed judgment within organizations.
Data consists of raw, unorganized facts or observations that are collected through transactions, measurements, or interactions. Data can take many forms, including numbers, text, images, audio, or video, but on its own it lacks context or meaning. For example, a list of customer names and addresses is simply a collection of data points. While accurate and necessary, data by itself does not answer questions or support decisions until it is organized and processed.
Information emerges when data is structured, processed, or analyzed in a way that adds context and meaning. Information answers basic questions such as who, what, when, where, and how. Continuing the earlier example, organizing customer addresses by city or region transforms the raw data into information by revealing patterns about customer locations. At this stage, information systems play a central role by sorting, aggregating, and presenting data in forms such as reports, summaries, or dashboards.
Knowledge goes a step further by incorporating human interpretation and experience. Knowledge is created when individuals or organizations analyze information, draw conclusions, and apply what they have learned to specific situations. It reflects an understanding of patterns, relationships, and implications. For instance, a marketing analyst might use customer location information to infer regional preferences, purchasing behaviors, or demand trends. This ability to interpret information and use it to guide action represents knowledge.
Wisdom represents the highest level in this progression and involves sound judgment informed by experience, reflection, and ethical consideration. Wisdom is not simply knowing what can be done, but understanding what shouldbe done in a given context. In the marketing example, wisdom would be demonstrated by using accumulated knowledge to make strategic decisions—such as adjusting product offerings or market strategies—in ways that benefit both the organization and its customers over the long term.
Understanding the distinctions among data, information, knowledge, and wisdom is essential for appreciating the role of information systems in organizations. While technology excels at collecting and processing data into information, human insight remains critical for transforming information into knowledge and wisdom. Effective information systems support this progression by ensuring that data is accurate, accessible, and meaningful, enabling better decisions and more thoughtful organizational outcomes.
Information Systems Contributions
Information systems play a central role in how organizations transform raw data into insight and informed action. Rather than supporting a single activity, these systems span the full progression from data to information, knowledge, and ultimately wiser decision-making. By integrating technology, processes, and people, information systems enable organizations to manage complexity, reduce uncertainty, and improve performance.
At the most basic level, information systems support data management by automating the collection, storage, and protection of data. Modern systems capture data in real time from transactions, sensors, user interactions, and external sources, reducing manual effort and errors. They organize this data in structured repositories that make it easier to retrieve, secure, and analyze. Backup and recovery mechanisms further ensure that data remains available and reliable, even in the face of system failures or disruptions.
Information systems also play a key role in transforming data into information. Through reporting tools, dashboards, and analytics platforms, these systems help users analyze and interpret data to uncover patterns, trends, and exceptions. By standardizing how information is generated and shared, information systems ensure that decision-makers across the organization have access to timely, consistent, and accurate information. This shared visibility supports coordination, accountability, and data-driven decision-making at both operational and managerial levels.
Beyond information, organizations rely on information systems to support knowledge management. Systems such as document repositories, collaboration platforms, and knowledge bases allow organizations to capture expertise, best practices, and lessons learned. By making knowledge accessible across departments and locations, information systems reduce dependence on individual employees and help preserve institutional memory. They also encourage collaboration and continuous learning by enabling employees to share ideas, solve problems collectively, and build on existing knowledge.
Finally, information systems contribute to the development of wisdom by supporting higher-level judgment and decision-making. Decision support systems, predictive analytics, and scenario modeling tools help organizations evaluate alternatives, anticipate future outcomes, and understand the potential consequences of their actions. While wisdom ultimately depends on human experience, values, and judgment, information systems enhance this process by providing evidence, insights, and structured ways to think about complex decisions.
In sum, information systems are foundational to how modern organizations manage and apply data at every level. By supporting data integrity, information clarity, knowledge sharing, and informed decision-making, these systems enable organizations not only to operate more efficiently, but also to learn, adapt, and act more wisely over time.
Data Storage and Retrieval
Data storage and retrieval are essential components of modern-day technology. As more and more data is generated, there is an increasing need to store it securely and efficiently. Additionally, it is equally important to be able to access and retrieve it promptly when needed. This is where databases, data warehouses, and data lakes come into play.
Databases
A database is a collection of data that is organized in a specific way to allow for efficient and easy retrieval. It is typically managed by a database management system (DBMS), which is a software system that allows users to interact with the database, perform operations such as adding, deleting, or modifying data, and extract information from it. The DBMS also ensures the security and integrity of the data by managing access permissions and maintaining consistency.
Databases can be further classified into relational and non-relational databases:
Relational Databases: In a relational database, data is stored in tables, with each table representing a specific entity or set of related entities. Each table is made up of columns that define the attributes of the entity or entities it represents, and rows that contain the actual data values. The data in the tables is related to each other through the use of keys, which are used to link data between the tables. The relational model is widely used, and databases such as Oracle, MySQL, and Microsoft SQL Server are popular examples of relational databases.
Non-Relational Databases: A non-relational database, flat-file database, or NoSQL database, is a database system that stores data in a non-tabular format, typically using document-oriented, key-value or graph data models. In contrast to relational databases, non-relational databases are more flexible and scalable as they do not require a rigid schema. Examples of NoSQL databases include MongoDB, Cassandra, and Redis.
Data Warehouses
Data warehouses are centralized repositories of integrated data, used for business intelligence and analytics. They collect data from various sources such as transactional databases, applications, and external sources, and organize them in a structured and easily accessible format.
Businesses use data warehouses for a variety of purposes, including:
Reporting and analysis: Data warehouses provide quick and easy access to data for reporting and analysis. Businesses can use the data to generate customized reports and gain insights into areas such as sales trends, customer behavior, and operational efficiency.
Decision-making: With timely and accurate data available in data warehouses, businesses can make informed decisions on a range of topics. This includes things like developing new products, expanding into new markets, and adjusting marketing strategies.
Consistency: Data warehouses ensure that data is consistent across all departments and functions within a business. This helps prevent errors or discrepancies and ensures that everyone is working from the same data.
Cost-effectiveness: By centralizing data in a data warehouse, businesses can save on costs associated with data storage, processing, and management. This is because data warehouses can handle large amounts of data, allowing businesses to consolidate their storage systems and reduce duplication of effort.
In short, data warehouses provide businesses with a powerful tool to manage their data and improve their decision-making capabilities.
Data Lakes
A data lake is a large, centralized repository that can store vast amounts of structured and unstructured data from multiple sources. It is a storage solution that allows businesses to store, process, and analyze large datasets without having to worry about the data’s structure or format. A data lake can store data from a variety of sources, including social media, customer interactions, machine logs, and operational systems.
In business, data lakes are used to store and analyze vast amounts of data, providing insights into customer behavior, market trends, and operational efficiency. With a data lake, businesses can have easy access to data from different sources, enabling them to make more informed decisions about product development, marketing, operations, and customer service.
Data lakes are also beneficial for machine learning and artificial intelligence applications, as they provide a large, diverse dataset for training models. In summary, data lakes can help businesses make sense of their data, derive valuable insights, and drive better business outcomes.
Using Databases
Databases are an essential tool in today’s information-driven world, allowing organizations to organize, store, and retrieve large amounts of data efficiently. A database is essentially a collection of related data that is stored in a structured format and accessed using a set of predefined queries or commands. These databases can be used to store and manage various types of information, from customer data and financial records to inventory and employee information.
Database Management Systems (DBMS)
A database management system (DBMS) is software that allows users to create, manipulate, and manage databases. A database is a collection of data that is organized, stored, and managed in a way that enables efficient access, retrieval, and modification of the data. DBMSs are used in various industries and settings, such as banking, healthcare, education, retail, and government, among others. Here are some examples of DBMSs and how they are used in a business setting:
Oracle Database: Oracle Database is a relational database management system that is widely used by large organizations to manage and store data. It provides various tools and functionalities for database management, including backup and recovery, access control, and data replication. In a business setting, Oracle Database can be used to manage customer data, sales data, financial data, and other types of data. For example, a bank may use Oracle Database to store customer account information, transaction data, and loan information. By using Oracle Database, the bank can ensure that the data is secure, accessible, and up-to-date, and use it to make informed decisions and improve its operations.
Microsoft Access: Microsoft Access is a desktop database management system that is widely used by small businesses and individuals to manage data. It provides tools for creating and managing databases, tables, forms, and reports, and can be integrated with other Microsoft applications such as Excel and SharePoint. In a business setting, Microsoft Access can be used to manage customer data, employee data, inventory data, and other types of data. For example, a small retail store may use Microsoft Access to track sales, inventory, and customer information. By using Microsoft Access, the store can organize its data, generate reports, and make informed decisions.
MongoDB: MongoDB is a document-oriented database management system that is designed for handling large volumes of unstructured data. It provides a flexible schema and can be used for storing and managing data such as social media posts, log files, and sensor data. In a business setting, MongoDB can be used to manage and analyze data from various sources, such as social media platforms, IoT devices, and online transactions. For example, a marketing agency may use MongoDB to analyze social media data and gain insights into customer behavior, preferences, and trends. By using MongoDB, the agency can tailor its marketing efforts to individual customers and improve customer engagement.
Hadoop: Hadoop is an open-source distributed computing platform that is used for storing and processing large amounts of data. It is commonly used in big data applications, such as data analytics and machine learning. Hadoop consists of two main components – the Hadoop Distributed File System (HDFS) and the MapReduce framework. In a business setting, Hadoop can be used to process and analyze large amounts of data from various sources, such as social media platforms, customer databases, and website analytics. For example, an e-commerce company may use Hadoop to analyze customer purchase history and website traffic data to identify market trends and optimize its product offerings. By using Hadoop, the company can process and analyze large amounts of data quickly and efficiently, and gain valuable insights into its operations and customers.
In conclusion, database management systems are essential tools for managing and storing data in a business setting. Oracle Database, Microsoft Access, MongoDB, and Hadoop are examples of DBMSs that can be used to manage and analyze data in various industries and settings. By using the appropriate DBMS, businesses can improve their operations, make informed decisions, and gain a competitive advantage. Effective use of DBMSs can lead to increased efficiency, productivity, and profitability.
Relational Databases
A relational database is a type of database that stores data in tables consisting of rows and columns, with each row representing a unique record and each column representing a field of data. The tables in a relational database are related to each other through a common field known as a primary key, which is used to establish relationships between tables. Relational databases are widely used in various industries, including banking, healthcare, retail, and education, among others.
Examples of Relational Databases
One example of a relational database is Oracle Database, which is used by many large organizations to store and manage large amounts of data. Oracle Database provides various tools and functionalities for database management, including backup and recovery, access control, and data replication. Another example of a relational database is Microsoft SQL Server, which is widely used by businesses of all sizes to manage their data. SQL Server provides various features such as Business Intelligence, which enables organizations to perform data analysis and make data-driven decisions.
Relational databases are also used in web applications to store and manage user data, including login credentials, user profiles, and preferences. For example, a social media platform like Facebook uses a relational database to store user data, including profile information, photos, and posts. The relationships between tables in the database enable Facebook to provide users with personalized content and recommendations based on their interests and interactions on the site.
Relational databases are also used in the healthcare industry to store patient data, including medical history, test results, and treatment plans. For example, a hospital’s electronic medical records system may use a relational database to store patient data, which enables healthcare providers to access and update patient information from different locations within the hospital. The relationships between tables in the database also enable healthcare providers to track patient progress over time and provide personalized treatment plans.
Another example of a relational database is a customer relationship management (CRM) system. CRM systems are used by businesses to manage their customer interactions and improve customer relationships. A CRM system may use a relational database to store customer data, including contact information, purchases, and interactions with the business. The relationships between tables in the database enable businesses to track customer interactions over time and identify trends and patterns in customer behavior.
In conclusion, a relational database is a type of database that stores data in tables consisting of rows and columns. Relational databases are widely used in various industries, including banking, healthcare, retail, and education, among others. Examples of relational databases include Oracle Database, Microsoft SQL Server social media platforms like Facebook, electronic medical records systems in hospitals, and customer relationship management systems used by businesses. Relational databases provide a structured and organized way to manage data, enabling businesses and organizations to make informed decisions and improve their operations. With the increasing importance of data in today’s world, relational databases will continue to play a critical role in managing and utilizing information effectively.
Linked Databases
One specific example of two simple relational databases that are linked by a key is a customer and sales database.
The customer database would contain information about each customer, such as their name, address, phone number, and email address. Each customer would have a unique primary key, such as a customer ID number. The sales database would contain information about each sales transaction, such as the date, product purchased, quantity, and price. Each sales transaction would also have a unique primary key, such as a sales ID number.
The two databases would be linked by the customer ID number, which would serve as a foreign key in the sales database. This would allow the sales database to reference the customer database and retrieve customer information for each sales transaction. For example, when a salesperson enters a new sales transaction into the sales database, they would select the customer from a drop-down list that is populated from the customer database. This would ensure that the customer information is consistent and up to date in both databases.
The linked databases would enable the business to perform various analyses and generate reports based on customer and sales information. For example, the business could generate a report that shows the total sales for each customer over a given period, the most popular products purchased by each customer, and the average revenue per customer. This information could be used to identify high-value customers, tailor marketing efforts to individual customers, and improve customer retention.
Furthermore, the linked databases could be used to automate various business processes, such as generating invoices and sending marketing emails. For example, when a sales transaction is entered into the sales database, an invoice could be automatically generated and sent to the customer via email. This would reduce the manual effort required by employees and improve the efficiency of the business.
In conclusion, linking two simple relational databases, such as a customer and sales database, can provide significant benefits to a business. The databases can be used to store and manage customer and sales information, generate reports and analyses, and automate various business processes. By using a primary key and foreign key to establish a relationship between the databases, the business can ensure that the information is consistent and up to date. Effective use of linked databases can lead to improved efficiency, customer satisfaction, and revenue growth.

Non-Relational Databases
A non-relational or flat-file database is a type of database that stores data in a plain text file, where each line represents a record and each value is separated by a delimiter character, such as a comma or tab. Flat-file databases are simple and easy to create but can become unwieldy as the data grows or becomes more complex.
Some examples of non-relational or flat-file databases include spreadsheets like Microsoft Excel or Google Sheets, where each row represents a record, and each column represents a field. Another example is a CSV (comma-separated values) file, which is a common format for exchanging data between different systems. Flat-file databases can be used for simple data storage, data manipulation, and data analysis, but they are limited in their ability to handle complex relationships between data or support advanced queries.
Database Query Languages
Database query languages are computer programming languages used to retrieve and manipulate data in a database. There are several types of query languages, including Structured Query Language (SQL), XML Query Language (XQuery), and Object Query Language (OQL).
Here are two examples of queries using a simple customer database example:
Query Language Examples
SQL Example:
SELECT * FROM customers WHERE last_name = ‘Smith’;
This SQL query retrieves all records from the customers table where the last name is ‘Smith’. The asterisk (*) is a wildcard character that selects all columns in the table. This query can be used to retrieve information about customers with the last name ‘Smith’, such as their address, phone number, and email address.
XQuery Example:
for $customer in /customers/customer
where $customer/last_name = ‘Smith’
return $customer
This XQuery retrieves all customer records from the customers XML document where the last name is ‘Smith’. The ‘for’ loop iterates over each customer element in the document, and the ‘where’ clause filters the results to only include customers with the last name ‘Smith’. The ‘return’ statement returns the selected customer elements.
In conclusion, database query languages are essential tools for retrieving and manipulating data in a database. SQL, XQuery, and OQL are examples of query languages that can be used to retrieve data from a customer database. By using the appropriate query language, businesses can retrieve information about their customers and use it to make informed decisions and improve their operations.
Big Data
Big data refers to extremely large, complex, and rapidly generated datasets produced by digital platforms, sensors, mobile devices, social media, and enterprise systems. These datasets often include a mix of structured data (such as transactional records), semi-structured data (such as logs or XML files), and unstructured data (such as text, images, audio, and video). Big data is commonly characterized by several defining attributes—often summarized as volume, velocity, variety, veracity, and value—which together exceed the practical limits of traditional data storage and processing tools.
For organizations, big data is valuable because it enables deeper and more timely insight into customers, operations, and markets. Through customer analytics, firms can examine purchasing behavior, preferences, and engagement patterns to personalize offerings, improve customer experience, and strengthen loyalty. Operational analytics allows organizations to analyze internal processes, identify inefficiencies, predict equipment failures, and improve resource utilization, often leading to cost reductions and productivity gains. Big data is also used in market and competitive analysis, where organizations combine internal data with external sources—such as economic indicators, social media activity, or industry data—to identify trends, emerging opportunities, and potential risks.
Overall, big data plays a critical role in modern business decision-making. When combined with appropriate analytics tools and governance practices, big data enables organizations to move beyond intuition and anecdotal evidence toward systematic, data-driven decisions. By better understanding customers, optimizing operations, and anticipating market changes, organizations can improve performance and sustain competitive advantage in increasingly data-intensive environments.
Data Modeling and Analysis
Business data modeling and analysis focuses on how organizations define, structure, and analyze data to support decision-making. Data modeling involves creating abstract representations of organizational data—such as entities, attributes, and relationships—to ensure data is consistent, well-organized, and aligned with business requirements. These models, which may be conceptual, logical, or physical, serve as blueprints for databases and analytics systems and help organizations manage data as a strategic asset.
Once data is properly modeled and stored, organizations apply analytical techniques to extract insight and value. Data mining refers to the use of statistical, mathematical, and machine learning techniques to discover patterns, relationships, and trends within large datasets. These techniques are particularly useful for uncovering insights that may not be visible through simple queries or reports and are commonly applied to areas such as customer behavior analysis, fraud detection, demand forecasting, and process optimization.
Data visualization plays a complementary role by presenting data and analytical results in graphical form, such as charts, dashboards, and interactive reports. Visualization helps decision-makers quickly identify patterns, anomalies, and relationships, making complex data easier to interpret and communicate. Descriptive analytics, which focuses on summarizing and explaining historical data, underpins many of these visualizations. Common descriptive techniques include data profiling, summary statistics, correlation analysis, and basic modeling approaches that help organizations understand what has happened and why.
Example: Amazon and Data Mining
One example of a company that used data mining for business success is Amazon. Amazon is the world’s biggest online retailer and has been using data mining to better understand their customers’ behavior and preferences.
In the late 1990s, Amazon started collecting data for every product purchased on their website. This data included items such as users’ purchase history, search queries, time spent on the website, products viewed and clicked, and many other metrics. With the help of machine learning algorithms, Amazon was able to use this data to create personalized recommendations for each customer based on their preferences.
As a result of this data analysis, Amazon was able to achieve numerous benefits. Firstly, they were able to provide personalized recommendations to users which increased the chances of repeat business. These recommendations were highly accurate, and it was common to see an increase of 30% or more from shoppers returning to their website to buy products that were recommended to them.
Secondly, Amazon was able to optimize the user experience for their customers by providing a highly personalized and targeted experience. This created a customer-focused approach that stood out against their competitors, helping to increase customer loyalty and retention.
Finally, data mining allowed Amazon to identify trends in their vast quantities of data that were not visible before. These insights provided a significant competitive advantage for Amazon in the online retail market, allowing the company to stay ahead of the curve and adapt quickly to changing consumer behavior.
In conclusion, Amazon’s use of data mining has been instrumental in their success, with personalized recommendations and a highly targeted customer experience leading to increased customer loyalty and revenue. The identification of trends in their vast quantities of data has allowed Amazon to stay ahead of the competition in the online retail market, and they continue to use data mining and analytics to improve their business operations.
Summary
This chapter focuses on Data Management and Analytics, which are essential components of any successful business. The chapter covers the basics of data, data storage and retrieval, database management systems, and data modeling and analysis.
Data is the foundation of any business, and understanding it is crucial for decision making. Data are essentially facts, figures, and statistics that are collected and stored for future use. This could be anything from sales figures to customer profiles, and it is important to categorize and organize data so that it can be used efficiently. Data can be classified into two categories: structured data and unstructured data. Structured data is organized, labeled, and stored in a format that can be easily retrieved, whereas unstructured data refers to data that is not organized and includes text, images, and videos.
Data storage and retrieval are necessary to maintain and access data easily. The primary means of storing data are files and databases. File storage is simple and low cost, but it lacks the capacity for efficient retrieval of specific data. Database storage, on the other hand, is organized and designed for easy retrieval of specific data. Database management systems (DBMS) are software applications used to manage the storage and retrieval of data. DBMS can be classified into Relational DBMS and Non-Relational DBMS. Relational DBMS organizes data into tables that can be linked together to create complex structures for storing information.
Data analysis is an essential tool used in decision making. Data analysis summarizes data to describe what happened in the past and identify the root cause of this issue. Once the root cause has been identified, steps can be taken to address the problem. By identifying the underlying causes of issues and taking concrete steps to address them, organizations can improve performance, increase effectiveness, and drive long-term success.
Discussion Questions:
- What is the difference between structured and unstructured data?
- How can data be stored and retrieved efficiently?
- What are the main components of a database management system?
- When should I use a relational database versus a flat file database?
- How do you approach data cleaning and preparation for analysis?
- What is the difference between supervised and unsupervised machine learning?
- How do you choose the best algorithm for a given analysis?