Madhu

UNIT- V

Introduction to Data Science

Data Science

Data Science is the study of data to extract meaningful insights for decision-making.
It combines techniques from Statistics, Computer Science, and Domain Knowledge to analyze, visualize, and predict outcomes.
It involves collecting, cleaning, analyzing, and interpreting data to solve real-world problems.

Importance of Data Science

Helps organizations make data-driven decisions.
Enables automation and predictions using Machine Learning.
Supports business intelligence and strategic planning.
Plays a key role in fields like healthcare, finance, e-commerce, and social media.

Components of Data Science

Data Collection – Gathering data from various sources (databases, web, sensors, etc.)
Data Cleaning – Removing errors, duplicates, and missing values.
Data Analysis – Using statistical methods and visualization to explore data.
Data Visualization – Representing data using graphs, charts, and dashboards.
Machine Learning – Building models to predict or classify data outcomes.
Communication of Results – Presenting insights to decision-makers.

Data Science Workflow

Define the Problem
Collect Data
Prepare Data (Cleaning and Transformation)
Analyze & Build Model
Evaluate Model Performance
Deploy & Monitor the Model

Tools and Technologies Used

Category	Tools/Technologies
Programming Languages	Python, R
Data Handling	SQL, Pandas, NumPy
Visualization	Matplotlib, Seaborn, Power BI, Tableau
Machine Learning	Scikit-learn, TensorFlow, PyTorch
Big Data	Hadoop, Spark

Applications of Data Science

Healthcare – Disease prediction, drug discovery
Finance – Fraud detection, stock market analysis
E-commerce – Product recommendation systems
Social Media – Sentiment analysis, targeted advertising
Transportation – Route optimization, autonomous vehicles

Skills Required for Data Scientists

Programming skills (Python/R)
Mathematics & Statistics
Data Visualization
Machine Learning
Communication & Problem-solving skills

Careers in Data Science

Data Analyst
Data Engineer
Machine Learning Engineer
Data Scientist
Business Intelligence Analyst

Functional Programming in Python

Introduction

· Functional Programming (FP) is a programming paradigm where programs are built using functions.

· It focuses on what to solve rather than how to solve it.

· Python supports both Object-Oriented and Functional programming styles (it’s a multi-paradigm language).

Key Concepts

Concept	Description
Function	A block of code that performs a specific task and can be reused.
Pure Function	A function that always produces the same output for the same input and has no side effects.
Immutability	Data is not changed; instead, new data is created.
First-Class Functions	Functions can be assigned to variables, passed as arguments, or returned from other functions.
Higher-Order Functions	Functions that take other functions as arguments or return them as results.

Advantages of Functional Programming

· Easier to debug and test.

· Promotes code reusability.

· Supports parallel and distributed computing.

· Produces clean and modular code.

Functional Programming Features in Python

Built-in Functions

Python provides many built-in functional tools like:

· map()

· filter()

· reduce()

· lambda (anonymous function)

Lambda Functions

· Small, anonymous functions created using the lambda keyword.

· Syntax:

·         lambda arguments: expression

· Example:

·         square = lambda x: x * x

·         print(square(5))  # Output: 25

map() Function

· Applies a function to each item in an iterable (like a list).

·         numbers = [1, 2, 3, 4, 5]

·         squares = list(map(lambda x: x*x, numbers))

·         print(squares)  # Output: [1, 4, 9, 16, 25]

filter() Function

· Filters elements from an iterable using a Boolean condition.

·         numbers = [1, 2, 3, 4, 5, 6]

·         even = list(filter(lambda x: x % 2 == 0, numbers))

·         print(even)  # Output: [2, 4, 6]

reduce() Function

· Used to reduce a list to a single value by repeatedly applying a function.

· It is available in the functools module.

·         from functools import reduce

·         numbers = [1, 2, 3, 4, 5]

·         product = reduce(lambda x, y: x * y, numbers)

·         print(product)  # Output: 120

Example: Combining Functional Tools

from functools import reduce

numbers = [1, 2, 3, 4, 5, 6]

result = reduce(lambda x, y: x + y,

                filter(lambda x: x % 2 == 0,

                       map(lambda x: x * x, numbers)))

print(result)  # Output: 56 (2² + 4² + 6²)

JSON and XML in Python

Introduction

Data is often exchanged between applications using structured formats.
Two commonly used data formats are:

JSON (JavaScript Object Notation)
XML (eXtensible Markup Language)

Python provides libraries to read, write, and process both easily.

JSON in Python

What is JSON?

JSON stands for JavaScript Object Notation.
It is a lightweight data format used to store and exchange data between systems.
It is easy for humans to read and easy for machines to parse.

JSON Structure

JSON data is written as key–value pairs.
Example:

{

"name": "Madhu",

"age": 25,

"department": "CSE",

"skills": ["Python", "Data Science"]

}

JSON vs Python Dictionary

JSON	Python
String format	Dictionary object
Uses double quotes	Uses single or double quotes
Can be stored in files	Used within programs

Working with JSON in Python

Python provides the built-in json module.

a) Importing JSON Module

import json

b) Converting Python Object to JSON

(Serialization – using json.dumps() or json.dump())

import json

data = {"name": "Madhu", "age": 25, "city": "Kurnool"}

json_string = json.dumps(data)

print(json_string)

c) Converting JSON to Python Object

(Deserialization – using json.loads() or json.load())

import json

json_data = '{"name": "Madhu", "age": 25, "city": "Kurnool"}'

python_obj = json.loads(json_data)

print(python_obj["name"]) # Output: Madhu

d) Reading JSON from a File

with open('data.json', 'r') as file:

data = json.load(file)

e) Writing JSON to a File

with open('data.json', 'w') as file:

json.dump(data, file)

XML in Python

What is XML?

XML (eXtensible Markup Language) is a markup language used to store and transport data.
It uses tags (like HTML) to define elements and their structure.

Example:

<name>Madhu</name>

</student>

Features of XML

Self-descriptive and hierarchical.
Platform-independent.
Used in many web and data exchange applications.

Parsing XML in Python

Python provides the xml.etree.ElementTree module to parse and create XML data.

a) Reading XML Data

import xml.etree.ElementTree as ET

tree = ET.parse('student.xml')

root = tree.getroot()

print(root.tag) # Output: student

for child in root:

print(child.tag, ":", child.text)

b) Creating XML Data

import xml.etree.ElementTree as ET

student = ET.Element('student')

name = ET.SubElement(student, 'name')

name.text = 'Madhu'

age = ET.SubElement(student, 'age')

age.text = '25'

tree = ET.ElementTree(student)

tree.write('student.xml')

JSON vs XML – Comparison

Feature	JSON	XML
Simplicity	Simple and compact	More verbose
Data Type	Supports arrays and objects	Only text-based data
Readability	Easy for humans	Harder to read
Parsing	Faster	Slower
Use Case	APIs, web applications	Documents, configurations

JSON and XML are formats for data storage and exchange.
JSON is lightweight and widely used in web APIs.
XML is more structured and descriptive, useful for hierarchical data.
Python provides built-in modules — json and xml.etree.ElementTree — to easily work with both.

NumPy with Python

Introduction to NumPy

NumPy stands for Numerical Python.
It is a powerful library used for numerical and scientific computing.
It provides support for multidimensional arrays, mathematical operations, and linear algebra.
Widely used in Data Science, Machine Learning, and Scientific Applications.

Why Use NumPy?

Python lists are slow and inefficient for numerical operations.
NumPy arrays are:

Faster and more memory-efficient
Allow vectorized operations (no need for loops)
Integrated with many scientific and ML libraries (Pandas, Scikit-learn, TensorFlow)

Installing NumPy

Before using NumPy, install it using:

pip install numpy

Then import it in Python:

import numpy as np

NumPy Arrays

The core of NumPy is the ndarray (N-dimensional array) object.

Creating Arrays

import numpy as np

# From list

arr = np.array([1, 2, 3, 4, 5])

print(arr)

# Multi-dimensional array

matrix = np.array([[1, 2, 3], [4, 5, 6]])

print(matrix)

Array Attributes

Attribute	Description	Example
ndim	Number of dimensions	arr.ndim
shape	Number of rows and columns	arr.shape
size	Total number of elements	arr.size
dtype	Data type of elements	arr.dtype

Example:

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.ndim) # 2

print(arr.shape) # (2, 3)

print(arr.size) # 6

Creating Arrays with Built-in Functions

Function	Description	Example
np.zeros()	Creates array of zeros	np.zeros((2,3))
np.ones()	Creates array of ones	np.ones((2,3))
np.arange()	Creates array with range of values	np.arange(0,10,2)
np.linspace()	Creates evenly spaced values	np.linspace(0,1,5)
np.eye()	Identity matrix	np.eye(3)
np.random.rand()	Random values between 0 and 1	np.random.rand(2,3)

Array Indexing and Slicing

You can access and modify array elements easily.

arr = np.array([10, 20, 30, 40, 50])

print(arr[0]) # First element

print(arr[1:4]) # Slicing elements

arr[2] = 100 # Modify element

print(arr)

For 2D arrays:

matrix = np.array([[1,2,3],[4,5,6],[7,8,9]])

print(matrix[1,2]) # Element at 2nd row, 3rd column

print(matrix[:,1]) # All rows, 2nd column

Array Operations

NumPy supports element-wise arithmetic operations.

a = np.array([1,2,3])

b = np.array([4,5,6])

print(a + b) # [5,7,9]

print(a - b) # [-3,-3,-3]

print(a * b) # [4,10,18]

print(a / b) # [0.25,0.4,0.5]

Also supports:

np.sum(a) – Sum of elements
np.mean(a) – Mean value
np.max(a) / np.min(a) – Max/Min element
np.sqrt(a) – Square root
np.dot(a, b) – Dot product

Array Reshaping

arr = np.arange(6)

print(arr.reshape(2,3)) # Reshape 1D → 2D

Combining and Splitting Arrays

a = np.array([[1,2],[3,4]])

b = np.array([[5,6]])

# Vertical stacking

print(np.vstack((a,b)))

# Horizontal stacking

print(np.hstack((a,b.T)))

Broadcasting

Allows arithmetic between arrays of different shapes.

a = np.array([[1,2,3],[4,5,6]])

b = np.array([10,20,30])

print(a + b)

Mathematical and Statistical Functions

Function	Description
np.mean(a)	Average of elements
np.median(a)	Median value
np.std(a)	Standard deviation
np.var(a)	Variance
np.sum(a)	Sum of elements
np.sqrt(a)	Square root

Example Program

import numpy as np

data = np.array([[2, 4, 6], [1, 3, 5]])

print("Original Array:\n", data)

print("Mean:", np.mean(data))

print("Max:", np.max(data))

print("Sum of each column:", np.sum(data, axis=0))

Applications of NumPy

Data Science – data manipulation and preprocessing
Machine Learning – matrix operations
Image Processing – pixel data manipulation
Scientific Computing – solving mathematical equations
Statistics & Probability – analyzing datasets

Summary

NumPy provides high-performance multi-dimensional arrays.
It replaces slow Python lists with efficient numerical computations.
Essential for Data Science, Machine Learning, and AI.

Pandas in Python

Introduction

Pandas are a powerful and popular Python library for data manipulation and analysis.
It provides high-performance data structures and data analysis tools.
The name “Pandas” comes from “Panel Data”, a term used in statistics.

Why Pandas?

Pandas make it easy to:

Handle and analyze tabular data (like Excel or CSV files).
Perform data cleaning, filtering, grouping, and aggregation.
Integrate seamlessly with NumPy, Matplotlib, and Scikit-learn.
Work with large datasets efficiently.

Installing Pandas

pip install pandas

Import it in Python:

import pandas as pd

Data Structures in Pandas

Pandas provide two main data structures:

Data Structure	Description	Example
Series	1D labeled array (like a column in Excel)	pd.Series()
DataFrame	2D labeled data (like a spreadsheet)	pd.DataFrame()

Pandas Series

A Series is like a one-dimensional array with labels (index).

import pandas as pd

data = pd.Series([10, 20, 30, 40])

print(data)

Output:

0 10

1 20

2 30

3 40

dtype: int64

Custom index:

data = pd.Series([100, 200, 300], index=['a', 'b', 'c'])

print(data['b']) # Output: 200

Pandas DataFrame

A DataFrame is a two-dimensional table of data with rows and columns.

import pandas as pd

data = {

'Name': ['Madhu', 'Latha', 'Ravi'],

'Age': [22, 21, 23],

'Dept': ['CSE', 'ECE', 'IT']

}

df = pd.DataFrame(data)

print(df)

Output:

Name Age Dept

0 Madhu 22 CSE

1 Latha 21 ECE

2 Ravi 23 IT

Reading and Writing Data

Pandas can read and write data from different file formats.

File Type	Function to Read	Function to Write
CSV	pd.read_csv()	to_csv()
Excel	pd.read_excel()	to_excel()
JSON	pd.read_json()	to_json()
SQL	pd.read_sql()	to_sql()

Example:

df = pd.read_csv('students.csv')

df.to_excel('students.xlsx', index=False)

DataFrame Operations

a) Viewing Data

df.head() # First 5 rows

df.tail(3) # Last 3 rows

df.info() # Summary of DataFrame

df.describe() # Statistical summary

df.shape # (rows, columns)

b) Selecting Data

df['Name'] # Select single column

df[['Name','Age']] # Multiple columns

df.iloc[0] # Select by row index

df.loc[1, 'Name'] # Select specific cell

Filtering and Conditional Selection

df[df['Age'] > 21]

df[(df['Dept'] == 'CSE') & (df['Age'] > 21)]

Adding and Removing Columns

df['Marks'] = [85, 90, 88] # Add new column

df.drop('Dept', axis=1, inplace=True) # Remove column

Handling Missing Data

df.isnull() # Check for missing values

df.dropna() # Drop rows with null values

df.fillna(0) # Replace nulls with 0

Sorting and Grouping Data

df.sort_values(by='Age', ascending=False)

df.groupby('Dept')['Marks'].mean()

Merging, Joining, and Concatenation

a) Merging

pd.merge(df1, df2, on='ID')

b) Concatenation

pd.concat([df1, df2])

Statistical and Mathematical Operations

df['Age'].mean()

df['Marks'].max()

df['Marks'].sum()

df.corr() # Correlation matrix

Example Program

import pandas as pd

data = {

'Student': ['A', 'B', 'C', 'D'],

'Marks': [85, 90, 78, 92],

'Department': ['CSE', 'IT', 'CSE', 'ECE']

}

df = pd.DataFrame(data)

print("Data:\n", df)

print("\nAverage Marks:", df['Marks'].mean())

print("\nCSE Students:\n", df[df['Department'] == 'CSE'])

Output:

Data:

Student Marks Department

0 A 85 CSE

1 B 90 IT

2 C 78 CSE

3 D 92 ECE

Average Marks: 86.25

CSE Students:

Student Marks Department

0 A 85 CSE

2 C 78 CSE

Applications of Pandas

Data Cleaning and Preparation
Statistical Analysis
Data Visualization (with Matplotlib/Seaborn)
Machine Learning Preprocessing
Financial Data Analysis

Important points

Pandas are the backbone of Data Science in Python.
Provides easy handling of structured data.
Supports file I/O, filtering, grouping, and analytics.
Works well with NumPy and Matplotlib.

In short:
NumPy = Numerical computations
Pandas = Data handling and analysis
Matplotlib/Seaborn = Data visualization

Plotting Graphs Using Pandas in Python

Introduction

Pandas provide built-in data visualization features using the Matplotlib library.
It allows us to create different types of plots and charts directly from Series or Data Frame objects.
Helps in understanding patterns, trends, and relationships in data visually.

Importing Required Libraries

Before plotting, import the necessary libraries:

import pandas as pd

import matplotlib.pyplot as plt

Note: If Matplotlib is not installed, install it using: pip install matplotlib

Creating a Simple DataFrame

Let’s create some data first:

import pandas as pd

data = {

'Year': [2020, 2021, 2022, 2023, 2024],

'Sales': [200, 250, 300, 350, 400],

'Profit': [20, 25, 30, 28, 35]

}

df = pd.DataFrame(data)

print(df)

Output:

Year Sales Profit

0 2020 200 20

1 2021 250 25

2 2022 300 30

3 2023 350 28

4 2024 400 35

Line Plot

A line plot is used to display data changes over a period of time.

df.plot(x='Year', y='Sales', kind='line', title='Yearly Sales', color='blue', marker='o')

plt.xlabel('Year')

plt.ylabel('Sales')

plt.grid(True)

plt.show()

Explanation:

kind='line' → line chart
x and y define which columns to use
marker='o' → shows data points on the line

Bar Plot

Used to compare categories or quantities.

df.plot(x='Year', y='Profit', kind='bar', title='Yearly Profit', color='orange')

plt.xlabel('Year')

plt.ylabel('Profit')

plt.show()

Explanation:

Each bar represents a category (here, year).
Useful for comparing profits or counts.

Multiple Line Plot

To compare two columns in one graph:

df.plot(x='Year', y=['Sales', 'Profit'], kind='line', marker='o')

plt.title('Sales vs Profit over Years')

plt.xlabel('Year')

plt.ylabel('Values')

plt.show()

Explanation:

Plots both columns on the same graph.
Helps to see the relationship between sales and profit.

Histogram

Used to display frequency distribution of numerical data.

df['Sales'].plot(kind='hist', bins=5, color='green', title='Sales Distribution')

plt.xlabel('Sales')

plt.show()

Explanation:

bins → number of intervals.
Useful for analyzing data spread or patterns.

Pie Chart

Used to show percentage or proportion of categories.

df['Profit'].plot(kind='pie', labels=df['Year'], autopct='%1.1f%%', startangle=90)

plt.title('Profit Share by Year')

plt.ylabel('')

plt.show()

Explanation:

autopct → shows percentage values.
startangle=90 → starts chart from the top.

Scatter Plot

Used to show relationship between two numeric variables.

df.plot(kind='scatter', x='Sales', y='Profit', color='red', title='Sales vs Profit')

plt.xlabel('Sales')

plt.ylabel('Profit')

plt.show()

Explanation:

Each point represents a (Sales, Profit) pair.
Helps identify trends or correlations.

Box Plot

Used for statistical analysis (to check data spread and outliers).

df[['Sales', 'Profit']].plot(kind='box', title='Sales and Profit Distribution')

plt.show()

Explanation:

Shows median, quartiles, and outliers.
Useful for understanding data variability.

Customizing the Graphs

You can enhance the appearance using Matplotlib options:

plt.figure(figsize=(8,5))

df.plot(x='Year', y='Sales', kind='line', color='purple', marker='o', linestyle='--')

plt.title('Customized Sales Graph')

plt.xlabel('Year')

plt.ylabel('Sales')

plt.grid(True)

plt.show()

Example: Comparing Multiple Graphs

import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))

# Line plot

plt.subplot(2,1,1)

plt.plot(df['Year'], df['Sales'], marker='o', color='blue', label='Sales')

plt.plot(df['Year'], df['Profit'], marker='s', color='red', label='Profit')

plt.title('Sales and Profit Comparison')

plt.legend()

# Bar plot

plt.subplot(2,1,2)

plt.bar(df['Year'], df['Sales'], color='green')

plt.title('Sales Growth')

plt.tight_layout()

plt.show()

Types of Plots Supported by Pandas

Plot Type	Parameter	Description
Line Plot	'line'	Default plot type
Bar Plot	'bar'	Vertical bars
Barh Plot	'barh'	Horizontal bars
Histogram	'hist'	Data distribution
Box Plot	'box'	Statistical view
Area Plot	'area'	Filled area under line
Pie Chart	'pie'	Category proportions
Scatter Plot	'scatter'	Relation between two variables

Pandas integrate with Matplotlib to make plotting simple and powerful.
Useful for data visualization, trend analysis, and decision-making.
Common plots include line, bar, scatter, pie, histogram, and box plots.
Helps engineers and analysts visualize complex data clearly and effectively.

**************************************

WEEK 5

List of Experiments

1. Python program to check whether a JSON string contains complex object or not.

2. Python Program to demonstrate NumPy arrays creation using array () function.

3. Python program to demonstrate use of ndim, shape, size, dtype.

4. Python program to demonstrate basic slicing, integer and Boolean indexing.

5. Python program to find min, max, sum, cumulative sum of array

6. Create a dictionary with at least five keys and each key represent value as a list where this list contains at least ten values and convert this dictionary as a pandas data frame and explore the data through the data frame as follows:

a) Apply head () function to the pandas data frame

b) Perform various data selection operations on Data Frame

7. Select any two columns from the above data frame, and observe the change in one attribute with respect

Program 1: Python program to check whether a JSON string contains complex object or not.

Method I

CODE:

import json

# Sample JSON strings

json_string1 = '{"name": "Maddy", "age": 22, "marks": {"math": 90, "science": 85}}'

json_string2 = '{"name": "Rahul", "age": 20, "city": "Delhi"}'

def has_complex_object(json_str):

try:

# Convert JSON string to Python object (dictionary)

data = json.loads(json_str)

# Check for any complex (nested) structure like dict or list

for value in data.values():

if isinstance(value, (dict, list)):

return True

return False

except json.JSONDecodeError:

print("Invalid JSON format!")

return None

# Test the function

print("JSON 1:", has_complex_object(json_string1)) # True → contains nested dict

print("JSON 2:", has_complex_object(json_string2)) # False → all values are simple

Output:

JSON 1: True

JSON 2: False

Explanation:

ü import json → to work with JSON data in Python.

ü json.loads() → converts JSON string into a Python dictionary.

ü The program checks each value in the dictionary:

o If any value is a list or another dictionary, it’s a complex object.

ü Returns:

o ✅ True → if complex object found

o ❌ False → if all values are simple (string, number, etc.)

Method II

JSON types: string, number, object (dict), array (list), true/false, null.

Complex numbers are not directly supported (e.g., 3+5j).

So if a JSON string contains something like a complex number, the standard json module will raise an error.

But we can detect whether a JSON string contains a complex object by:

Trying to parse it with json.loads().

If it fails, check if the data contains "j" (imaginary unit).

Or, after parsing, scan values to see if any are complex-like.

Program: Detect Complex Object in JSON String

CODE:

import json

def contains_complex(json_str):

try:

data = json.loads(json_str) # Try parsing JSON

# Recursively check if any value is a complex number

def check_complex(obj):

if isinstance(obj, dict):

return any(check_complex(v) for v in obj.values())

elif isinstance(obj, list):

return any(check_complex(i) for i in obj)

elif isinstance(obj, str):

# Check if string looks like a complex number (e.g., "3+4j")

try:

complex(obj) # Attempt conversion

return True

except ValueError:

return False

else:

return False

return check_complex(data)

except json.JSONDecodeError:

return False

# Example JSON strings

json1 = '{"name": "Alice", "age": 25, "number": "3+4j"}'

json2 = '{"x": 10, "y": 20}'

print("JSON 1 contains complex:", contains_complex(json1)) # True

print("JSON 2 contains complex:", contains_complex(json2)) # False

Output:

JSON 1 contains complex: True

JSON 2 contains complex: False

Explanation

json.loads(json_str) → Parses JSON into Python dict/list.

Example: '{"a": 1}' → {"a": 1} (Python dict).

Recursive function check_complex():

If object is a dict → check all values.

If object is a list → check all items.

If object is a string → try converting to complex().

If conversion succeeds → it’s a complex-like value.

Returns true if any string looks like "a+bj", otherwise False.

Note:

If you really want to store complex numbers in JSON, you need custom encoding (e.g., save as {"real": 3, "imag": 4}).

Program 2: Python Program to demonstrate NumPy arrays creation using array () function

NumPy array() Function

What is numpy.array()?

The array() function in NumPy is used to create an ndarray (N-dimensional array) from:

Python lists

Python tuples

Nested sequences (list of lists → matrix)

Syntax:

numpy.array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)

Parameters:

object → input data (list, tuple, nested list, etc.)

dtype → specify data type (int32, float64, etc.)

copy → if True, copy is created; if False, reference is used if possible

order → memory layout:

'C' = row-major (C-style, default)

'F' = column-major (Fortran-style)

subok → if True, subclasses are passed through

ndmin → minimum number of dimensions

Method I

CODE:

# Import the NumPy library

import numpy as np

# 1 Create a 1-D array (one-dimensional)

arr1 = np.array([10, 20, 30, 40, 50])

print("1-D Array:")

print(arr1)

# 2 Create a 2-D array (two-dimensional)

arr2 = np.array([[1, 2, 3], [4, 5, 6]])

print("\n2-D Array:")

print(arr2)

# 3 Create a 3-D array (three-dimensional)

arr3 = np.array([

[[1, 2], [3, 4]],

[[5, 6], [7, 8]]

])

print("\n3-D Array:")

print(arr3)

# 4 Check type and dimension of arrays

print("\nType of arr1:", type(arr1))

print("Dimension of arr1:", arr1.ndim)

print("Dimension of arr2:", arr2.ndim)

print("Dimension of arr3:", arr3.ndim)

Output

1-D Array:

[10 20 30 40 50]

2-D Array:

[[1 2 3]

[4 5 6]]

3-D Array:

[[1 2]

[3 4]]

[[5 6]

[7 8]]]

Type of arr1: <class 'numpy.ndarray'>

Dimension of arr1: 1

Dimension of arr2: 2

Dimension of arr3: 3

Method II

Python Program: Demonstrating numpy.array()

CODE:

import numpy as np

# 1. Creating 1D array from list

arr1 = np.array([1, 2, 3, 4, 5])

print("1D Array:", arr1)

# 2. Creating 2D array (Matrix) from nested list

arr2 = np.array([[1, 2, 3], [4, 5, 6]])

print("\n2D Array:\n", arr2)

# 3. Creating array from tuple

arr3 = np.array((10, 20, 30))

print("\nArray from Tuple:", arr3)

# 4. Specifying dtype

arr4 = np.array([1, 2, 3], dtype=float)

print("\nArray with dtype float:", arr4)

# 5. Using ndmin (minimum dimensions)

arr5 = np.array([1, 2, 3, 4], ndmin=3)

print("\nArray with ndmin=3:\n", arr5)

print("Shape of arr5:", arr5.shape)

# 6. Copy parameter

list_data = [1, 2, 3]

arr6 = np.array(list_data, copy=False)

print("\nOriginal List:", list_data)

print("NumPy Array (copy=False):", arr6)

# Modify list and check array

list_data[0] = 99

print("Modified List:", list_data)

print("NumPy Array after modifying list:", arr6) # Will it change?

OUTPUT:

1D Array: [1 2 3 4 5]

2D Array:

[[1 2 3]

[4 5 6]]

Array from Tuple: [10 20 30]

Array with dtype float: [1. 2. 3.]

Array with ndmin=3:

[[[1 2 3 4]]]

Shape of arr5: (1, 1, 4)

Original List: [1, 2, 3]

NumPy Array (copy=False): [1 2 3]

Modified List: [99, 2, 3]

NumPy Array after modifying list: [1 2 3]

Notice:

NumPy didn’t update arr6 when the list was modified — because by default, NumPy tries to copy data into its own memory-efficient format.

If dtype=float is set, integers are automatically converted.

ndmin=3 creates at least 3D array (extra dimensions are added).

Key Takeaways

np.array() converts Python lists/tuples into NumPy ndarrays.

Supports dtype conversion, multi-dimensional arrays, and custom memory layouts.

Very efficient compared to Python lists (uses less memory, faster).

Program 3: Python program to demonstrate use of ndim, shape, size, dtype.v

ndim → Number of dimensions (axes) of the array.

shape → Tuple of array dimensions (rows, cols, etc.).

size → Total number of elements in the array.

dtype → Data type of array elements (int32, float64, etc.).

Method I

CODE:

# Import NumPy library

import numpy as np

# Create a 2D NumPy array

arr = np.array([[10, 20, 30], [40, 50, 60]])

# Display the array

print("Array:")

print(arr)

# 1 Number of dimensions

print("\nNumber of Dimensions (ndim):", arr.ndim)

# 2 Shape of the array (rows, columns)

print("Shape of Array (shape):", arr.shape)

# 3 Total number of elements in the array

print("Size of Array (size):", arr.size)

# 4 Data type of elements stored in array

print("Data Type of Elements (dtype):", arr.dtype)

OUTPUT:

Array:

[[10 20 30]

[40 50 60]]

Number of Dimensions (ndim): 2

Shape of Array (shape): (2, 3)

Size of Array (size): 6

Data Type of Elements (dtype): int64

Explanation

Attribute	Meaning	Example Output
ndim	Number of dimensions of array	2 (since it’s 2D)
shape	Tuple showing rows and columns	(2, 3) → 2 rows, 3 columns
size	Total number of elements	6
dtype	Data type of array elements	int64 or int32 (depends on your system)

Method II

Python Program: Demonstrating ndim, shape, size, dtype

CODE:

import numpy as np

# 1D Array

arr1 = np.array([10, 20, 30, 40])

print("Array 1:", arr1)

print("ndim:", arr1.ndim) # number of dimensions

print("shape:", arr1.shape) # (4,) → 1 row, 4 columns

print("size:", arr1.size) # total elements

print("dtype:", arr1.dtype) # data type

print("-" * 50)

# 2D Array

arr2 = np.array([[1, 2, 3], [4, 5, 6]])

print("Array 2:\n", arr2)

print("ndim:", arr2.ndim) # 2D (matrix)

print("shape:", arr2.shape) # (2,3) → 2 rows, 3 columns

print("size:", arr2.size) # 6 elements

print("dtype:", arr2.dtype)

print("-" * 50)

# 3D Array

arr3 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

print("Array 3:\n", arr3)

print("ndim:", arr3.ndim) # 3D array

print("shape:", arr3.shape) # (2,2,2) → 2 blocks, 2 rows, 2 cols

print("size:", arr3.size) # 8 elements

print("dtype:", arr3.dtype)

OUTPUT:

Array 1: [10 20 30 40]

ndim: 1

shape: (4,)

size: 4

dtype: int64

--------------------------------------------------

Array 2:

[[1 2 3]

[4 5 6]]

ndim: 2

shape: (2, 3)

size: 6

dtype: int64

--------------------------------------------------

Array 3:

[[[1 2]

[3 4]]

[[5 6]

[7 8]]]

ndim: 3

shape: (2, 2, 2)

size: 8

dtype: int64

Explanation

ndim tells us if it’s 1D, 2D, or 3D.

arr1 → 1D

arr2 → 2D (matrix)

arr3 → 3D (cube/block).

shape gives dimensions:

arr1 → (4,) (4 elements, 1 row).

arr2 → (2,3) (2 rows × 3 columns).

arr3 → (2,2,2) (2 blocks × 2 rows × 2 columns).

size = total number of elements (product of shape).

dtype = NumPy automatically chooses efficient type (int64, float32, etc.).

Program 4: Python program to demonstrate basic slicing, integer and Boolean indexing.

Method I

CODE:

# Import NumPy library

import numpy as np

# Create a 1D NumPy array

arr = np.array([10, 20, 30, 40, 50, 60, 70])

print("Original Array:")

print(arr)

# 1 Basic Slicing

print("\n1. Basic Slicing Examples:")

print("Elements from index 1 to 4:", arr[1:5]) # 20 to 50

print("Elements from start to 3:", arr[:4]) # 10 to 40

print("Elements from index 3 to end:", arr[3:]) # 40 to 70

print("Every second element:", arr[::2]) # 10, 30, 50, 70

# 2 Integer Indexing

print("\n2. Integer Indexing Examples:")

indices = [0, 2, 5]

print("Elements at positions 0, 2, 5:", arr[indices]) # 10, 30, 60

# 3 Boolean Indexing

print("\n3. Boolean Indexing Examples:")

bool_mask = arr > 40

print("Boolean Mask (arr > 40):", bool_mask)

print("Elements greater than 40:", arr[bool_mask])

OUTPUT:

Original Array:

[10 20 30 40 50 60 70]

1. Basic Slicing Examples:

Elements from index 1 to 4: [20 30 40 50]

Elements from start to 3: [10 20 30 40]

Elements from index 3 to end: [40 50 60 70]

Every second element: [10 30 50 70]

2. Integer Indexing Examples:

Elements at positions 0, 2, 5: [10 30 60]

3. Boolean Indexing Examples:

Boolean Mask (arr > 40): [False False False False True True True]

Elements greater than 40: [50 60 70]

Explanation

Concept	Description	Example
Basic Slicing	Selects continuous elements using start:end:step	arr[1:5] → 20 30 40 50
Integer Indexing	Selects elements at specific positions	arr[[0, 2, 5]] → 10 30 60
Boolean Indexing	Uses True/False array to filter elements	arr[arr > 40] → 50 60 70

Method II

CODE:

import numpy as np

# Create a NumPy array

arr = np.array([10, 20, 30, 40, 50, 60, 70])

print("Original Array:")

print(arr)

# ---------------------------

# 1.Basic Slicing

# ---------------------------

# Get elements from index 2 to 5 (5 excluded)

slice1 = arr[2:5]

print("\nBasic Slicing arr[2:5]:", slice1)

# Get every 2nd element

slice2 = arr[::2]

print("Basic Slicing arr[::2] (every 2nd element):", slice2)

# ---------------------------

# 2.Integer Indexing

# ---------------------------

# Access multiple elements using a list of indices

indices = [1, 3, 5]

int_indexed = arr[indices]

print("\nInteger Indexing arr[[1,3,5]]:", int_indexed)

# ---------------------------

# 3.Boolean Indexing

# ---------------------------

# Create a Boolean condition

bool_indexed = arr[arr > 30] # all elements greater than 30

print("\nBoolean Indexing arr[arr > 30]:", bool_indexed)

OUTPUT:

less

Copy code

Original Array:

[10 20 30 40 50 60 70]

Basic Slicing arr[2:5]: [30 40 50]

Basic Slicing arr[::2] (every 2nd element): [10 30 50 70]

Integer Indexing arr[[1,3,5]]: [20 40 60]

Boolean Indexing arr[arr > 30]: [40 50 60 70]

Explanation

1. Basic Slicing

o arr[start:end] → selects elements from start to end-1.

o arr[start:end:step] → selects elements with a step size.

2. Integer Indexing

o You can pass a list of indices to access multiple elements at once.

o Example: arr[[1,3,5]] selects 2nd, 4th, and 6th elements.

3. Boolean Indexing

o You can create a condition that returns a Boolean array, and use it to filter elements.

o Example: arr[arr > 30] selects all elements greater than 30.

Program 5: Python program to find min, max, sum, cumulative sum of array

CODE:

import numpy as np

# Create a NumPy array

arr = np.array([10, 20, 30, 40, 50])

print("Original Array:")

print(arr)

# Minimum value

min_val = np.min(arr)

print("\nMinimum value:", min_val)

# Maximum value

max_val = np.max(arr)

print("Maximum value:", max_val)

# Sum of all elements

sum_val = np.sum(arr)

print("Sum of elements:", sum_val)

# Cumulative sum

cum_sum = np.cumsum(arr)

print("Cumulative sum:", cum_sum)

OUTPUT:

yaml

Copy code

Original Array:

[10 20 30 40 50]

Minimum value: 10

Maximum value: 50

Sum of elements: 150

Cumulative sum: [ 10 30 60 100 150]

Explanation

np.min(arr) → Returns the smallest element in the array.

np.max(arr) → Returns the largest element in the array.

np.sum(arr) → Returns the sum of all elements.

np.cumsum(arr) → Returns the cumulative sum, i.e., running total of elements.

Program 6: Create a dictionary with at least five keys and each key represent value as a

list where this list contains at least ten values and convert this dictionary as a

pandas data frame and explore the data through the data frame as follows:

a) Apply head () function to the pandas data frame

b) Perform various data selection operations on Data Frame

(a) Apply head () function to the pandas data frame

CODE:

# Import pandas library

import pandas as pd

# 1Create a dictionary with 5 keys and 10 values each

student_data = {

'Name': ['Asha', 'Ravi', 'Kiran', 'Maya', 'John', 'Lina', 'Raj', 'Sara', 'Tom', 'Anu'],

'Age': [18, 19, 20, 18, 21, 22, 19, 20, 18, 21],

'Marks_Math': [78, 85, 92, 67, 88, 90, 76, 82, 95, 80],

'Marks_Science': [82, 79, 88, 91, 73, 85, 89, 77, 94, 80],

'City': ['Delhi', 'Mumbai', 'Chennai', 'Kolkata', 'Delhi', 'Pune', 'Hyderabad', 'Bangalore', 'Kochi', 'Jaipur']

}

# 2 Convert dictionary to a pandas DataFrame

df = pd.DataFrame(student_data)

# 3 Display the complete DataFrame

print("Complete DataFrame:")

print(df)

# 4 Apply head() function to display first 5 rows

print("\nFirst 5 Rows using head():")

print(df.head())

OUTPUT:

Complete DataFrame:

Name Age Marks_Math Marks_Science City

0 Asha 18 78 82 Delhi

1 Ravi 19 85 79 Mumbai

2 Kiran 20 92 88 Chennai

3 Maya 18 67 91 Kolkata

4 John 21 88 73 Delhi

5 Lina 22 90 85 Pune

6 Raj 19 76 89 Hyderabad

7 Sara 20 82 77 Bangalore

8 Tom 18 95 94 Kochi

9 Anu 21 80 80 Jaipur

First 5 Rows using head():

Name Age Marks_Math Marks_Science City

0 Asha 18 78 82 Delhi

1 Ravi 19 85 79 Mumbai

2 Kiran 20 92 88 Chennai

3 Maya 18 67 91 Kolkata

4 John 21 88 73 Delhi

(b) Perform various data selection operations on Data Frame

CODE:

# Import pandas library

import pandas as pd

# 1 Create a dictionary with 5 keys and 10 values each

student_data = {

'Name': ['Asha', 'Ravi', 'Kiran', 'Maya', 'John', 'Lina', 'Raj', 'Sara', 'Tom', 'Anu'],

'Age': [18, 19, 20, 18, 21, 22, 19, 20, 18, 21],

'Marks_Math': [78, 85, 92, 67, 88, 90, 76, 82, 95, 80],

'Marks_Science': [82, 79, 88, 91, 73, 85, 89, 77, 94, 80],

'City': ['Delhi', 'Mumbai', 'Chennai', 'Kolkata', 'Delhi', 'Pune', 'Hyderabad', 'Bangalore', 'Kochi', 'Jaipur']

}

# 2 Convert dictionary into a pandas DataFrame

df = pd.DataFrame(student_data)

# Display the complete DataFrame

print("Complete DataFrame:")

print(df)

# 3 Explore the data

print("\nFirst 5 rows using head():")

print(df.head())

# 4 Perform various Data Selection Operations

# a) Select a single column

print("\n(a) Selecting a single column (Marks_Math):")

print(df['Marks_Math'])

# b) Select multiple columns

print("\n(b) Selecting multiple columns (Name, City, Marks_Science):")

print(df[['Name', 'City', 'Marks_Science']])

# c) Select a specific row using loc (by label)

print("\n(c) Selecting a specific row using loc (row index 2):")

print(df.loc[2])

# d) Select a specific row using iloc (by position)

print("\n(d) Selecting a specific row using iloc (row position 4):")

print(df.iloc[4])

OUTPUT:

Complete DataFrame:

Name Age Marks_Math Marks_Science City

0 Asha 18 78 82 Delhi

1 Ravi 19 85 79 Mumbai

2 Kiran 20 92 88 Chennai

3 Maya 18 67 91 Kolkata

4 John 21 88 73 Delhi

5 Lina 22 90 85 Pune

6 Raj 19 76 89 Hyderabad

7 Sara 20 82 77 Bangalore

8 Tom 18 95 94 Kochi

9 Anu 21 80 80 Jaipur

First 5 rows using head():

Name Age Marks_Math Marks_Science City

0 Asha 18 78 82 Delhi

1 Ravi 19 85 79 Mumbai

2 Kiran 20 92 88 Chennai

3 Maya 18 67 91 Kolkata

4 John 21 88 73 Delhi

(a) Selecting a single column (Marks_Math):

0 78

1 85

2 92

3 67

4 88

5 90

6 76

7 82

8 95

9 80

Name: Marks_Math, dtype: int64

(b) Selecting multiple columns (Name, City, Marks_Science):

Name City Marks_Science

0 Asha Delhi 82

1 Ravi Mumbai 79

2 Kiran Chennai 88

3 Maya Kolkata 91

4 John Delhi 73

5 Lina Pune 85

6 Raj Hyderabad 89

7 Sara Bangalore 77

8 Tom Kochi 94

9 Anu Jaipur 80

(c) Selecting a specific row using loc (row index 2):

Name Kiran

Age 20

Marks_Math 92

Marks_Science 88

City Chennai

Name: 2, dtype: object

(d) Selecting a specific row using iloc (row position 4):

Name John

Age 21

Marks_Math 88

Marks_Science 73

City Delhi

Name: 4, dtype: object

Program 7: Select any two columns from the above data frame, and observe the change

in one attribute with respect to other attribute with scatter and plot operations

in matplotlib

CODE:

# Import necessary libraries

import pandas as pd

import matplotlib.pyplot as plt

# 1 Create a dictionary with sample student data

student_data = {

'Name': ['Asha', 'Ravi', 'Kiran', 'Maya', 'John', 'Lina', 'Raj', 'Sara', 'Tom', 'Anu'],

'Age': [18, 19, 20, 18, 21, 22, 19, 20, 18, 21],

'Marks_Math': [78, 85, 92, 67, 88, 90, 76, 82, 95, 80],

'Marks_Science': [82, 79, 88, 91, 73, 85, 89, 77, 94, 80],

'City': ['Delhi', 'Mumbai', 'Chennai', 'Kolkata', 'Delhi', 'Pune', 'Hyderabad', 'Bangalore', 'Kochi', 'Jaipur']

}

# 2 Convert dictionary into a pandas DataFrame

df = pd.DataFrame(student_data)

# Display the DataFrame

print("Student DataFrame:")

print(df)

# 3 Select two columns for visualization

x = df['Marks_Math']

y = df['Marks_Science']

# 4 Create a Scatter Plot

plt.scatter(x, y, color='blue', marker='o')

plt.title("Scatter Plot: Marks in Math vs Science")

plt.xlabel("Marks in Math")

plt.ylabel("Marks in Science")

plt.grid(True)

plt.show()

# 5 Create a Line Plot (Plot Operation)

plt.plot(x, y, color='green', linestyle='--', marker='o')

plt.title("Line Plot: Marks in Math vs Science")

plt.xlabel("Marks in Math")

plt.ylabel("Marks in Science")

plt.grid(True)

plt.show()

OUTPUT:

Student DataFrame:

Name Age Marks_Math Marks_Science City

0 Asha 18 78 82 Delhi

1 Ravi 19 85 79 Mumbai

2 Kiran 20 92 88 Chennai

3 Maya 18 67 91 Kolkata

4 John 21 88 73 Delhi

5 Lina 22 90 85 Pune

6 Raj 19 76 89 Hyderabad

7 Sara 20 82 77 Bangalore

8 Tom 18 95 94 Kochi

9 Anu 21 80 80 Jaipur

Explanation:

Step	Function	Description
plt.scatter(x, y)	Creates a scatter plot	Shows how one variable changes with another
plt.plot(x, y)	Creates a line plot	Connects data points with lines
xlabel(), ylabel()	Label axes	Gives context to the chart
title()	Adds a title	Describes what the plot represents
plt.show()	Displays the plot window	Shows the graph

***************************END***************************

Madhu

Saturday, October 18, 2025

WEEK 5 Program's

Introduction

Key Concepts

Advantages of Functional Programming

Functional Programming Features in Python

Built-in Functions

Lambda Functions

map() Function

filter() Function

reduce() Function

Example: Combining Functional Tools

PP UNIT IV & WEEK IV PROGRAMS

Search This Blog