UNIT- V
Introduction
to Data Science
Data Science
- Data Science is the study of
data to extract meaningful insights for decision-making.
- It combines
techniques from Statistics, Computer Science, and Domain Knowledge
to analyze, visualize, and predict outcomes.
- It involves
collecting, cleaning, analyzing, and interpreting data to solve real-world
problems.
Importance
of Data Science
- Helps
organizations make data-driven decisions.
- Enables automation
and predictions using Machine Learning.
- Supports business
intelligence and strategic planning.
- Plays a key role
in fields like healthcare, finance, e-commerce, and social media.
Components of Data Science
- Data Collection – Gathering
data from various sources (databases, web, sensors, etc.)
- Data Cleaning – Removing
errors, duplicates, and missing values.
- Data Analysis – Using
statistical methods and visualization to explore data.
- Data
Visualization
– Representing data using graphs, charts, and dashboards.
- Machine Learning – Building
models to predict or classify data outcomes.
- Communication of
Results
– Presenting insights to decision-makers.
Data Science Workflow
- Define the Problem
- Collect Data
- Prepare Data (Cleaning and Transformation)
- Analyze & Build Model
- Evaluate Model Performance
- Deploy & Monitor the Model
Tools and Technologies Used
|
Category |
Tools/Technologies |
|
Programming
Languages |
Python,
R |
|
Data
Handling |
SQL,
Pandas, NumPy |
|
Visualization |
Matplotlib,
Seaborn, Power BI, Tableau |
|
Machine
Learning |
Scikit-learn,
TensorFlow, PyTorch |
|
Big
Data |
Hadoop,
Spark |
Applications of Data Science
- Healthcare – Disease
prediction, drug discovery
- Finance – Fraud
detection, stock market analysis
- E-commerce – Product
recommendation systems
- Social Media – Sentiment
analysis, targeted advertising
- Transportation – Route
optimization, autonomous vehicles
Skills Required for Data Scientists
- Programming skills (Python/R)
- Mathematics & Statistics
- Data Visualization
- Machine Learning
- Communication & Problem-solving skills
Careers in Data Science
- Data Analyst
- Data Engineer
- Machine Learning
Engineer
- Data Scientist
- Business
Intelligence Analyst
Functional Programming in Python
Introduction
·
Functional
Programming (FP)
is a programming paradigm where programs are built using functions.
·
It
focuses on what to solve
rather than how to solve
it.
·
Python
supports both Object-Oriented
and Functional
programming styles (it’s a multi-paradigm
language).
Key Concepts
|
Concept |
Description |
|
Function |
A
block of code that performs a specific task and can be reused. |
|
Pure
Function |
A
function that always produces the same output for the same input and has no
side effects. |
|
Immutability |
Data
is not changed; instead, new data is created. |
|
First-Class
Functions |
Functions
can be assigned to variables, passed as arguments, or returned from other
functions. |
|
Higher-Order
Functions |
Functions
that take other functions as arguments or return them as results. |
Advantages of Functional Programming
·
Easier
to debug and test.
·
Promotes
code reusability.
·
Supports
parallel and distributed
computing.
·
Produces
clean and modular
code.
Functional Programming Features in Python
Built-in
Functions
Python provides many built-in
functional tools like:
·
map()
·
filter()
·
reduce()
·
lambda (anonymous function)
Lambda Functions
·
Small,
anonymous functions
created using the lambda keyword.
·
Syntax:
· lambda arguments: expression·
Example:
· square = lambda x: x * x· print(square(5)) # Output: 25
map() Function
·
Applies
a function to each item
in an iterable (like a list).
· numbers = [1, 2, 3, 4, 5]· squares = list(map(lambda x: x*x, numbers))· print(squares) # Output: [1, 4, 9, 16, 25]
filter() Function
·
Filters
elements from an iterable using a Boolean
condition.
· numbers = [1, 2, 3, 4, 5, 6]· even = list(filter(lambda x: x % 2 == 0, numbers))· print(even) # Output: [2, 4, 6]
reduce() Function
·
Used
to reduce
a list to a single value by repeatedly applying a function.
·
It
is available in the functools module.
· from functools import reduce· numbers = [1, 2, 3, 4, 5]· product = reduce(lambda x, y: x * y, numbers)· print(product) # Output: 120
Example: Combining Functional Tools
from functools import reducenumbers = [1, 2, 3, 4, 5, 6]result = reduce(lambda x, y: x + y, filter(lambda x: x % 2 == 0, map(lambda x: x * x, numbers)))print(result) # Output: 56 (2² + 4² + 6²)
JSON and XML in Python
Introduction
Data
is often exchanged between applications using structured formats.
Two commonly used data formats are:
- JSON
(JavaScript Object Notation)
- XML
(eXtensible Markup Language)
Python
provides libraries to read, write, and
process both easily.
JSON
in Python
What is JSON?
- JSON
stands for JavaScript Object
Notation.
- It is a lightweight data format used to
store and exchange data between systems.
- It is easy for humans to read and easy for machines to parse.
JSON Structure
JSON
data is written as key–value pairs.
Example:
{
"name": "Madhu",
"age": 25,
"department": "CSE",
"skills": ["Python",
"Data Science"]
}
JSON
vs Python Dictionary
|
JSON |
Python |
|
String format |
Dictionary object |
|
Uses double quotes |
Uses single or double quotes |
|
Can be stored in files |
Used within programs |
Working with JSON in Python
Python
provides the built-in json
module.
a) Importing JSON Module
import
json
b) Converting Python Object to JSON
(Serialization
– using json.dumps() or json.dump())
import
json
data =
{"name": "Madhu", "age": 25, "city":
"Kurnool"}
json_string
= json.dumps(data)
print(json_string)
c) Converting JSON to Python Object
(Deserialization
– using json.loads() or json.load())
import
json
json_data
= '{"name": "Madhu", "age": 25, "city":
"Kurnool"}'
python_obj
= json.loads(json_data)
print(python_obj["name"]) # Output: Madhu
d) Reading JSON from a File
with
open('data.json', 'r') as file:
data = json.load(file)
e) Writing JSON to a File
with
open('data.json', 'w') as file:
json.dump(data, file)
XML in Python
What is XML?
- XML
(eXtensible Markup Language) is a markup language
used to store and transport data.
- It uses tags (like HTML) to define
elements and their structure.
Example:
<student>
<name>Madhu</name>
<age>25</age>
<department>CSE</department>
</student>
Features
of XML
- Self-descriptive and
hierarchical.
- Platform-independent.
- Used in many web and data
exchange applications.
Parsing
XML in Python
Python provides the xml.etree.ElementTree module to parse
and create XML data.
a)
Reading XML Data
import
xml.etree.ElementTree as ET
tree =
ET.parse('student.xml')
root =
tree.getroot()
print(root.tag) # Output: student
for
child in root:
print(child.tag, ":", child.text)
b)
Creating XML Data
import
xml.etree.ElementTree as ET
student
= ET.Element('student')
name =
ET.SubElement(student, 'name')
name.text
= 'Madhu'
age =
ET.SubElement(student, 'age')
age.text
= '25'
tree =
ET.ElementTree(student)
tree.write('student.xml')
JSON vs XML – Comparison
|
Feature |
JSON |
XML |
|
Simplicity |
Simple
and compact |
More
verbose |
|
Data
Type |
Supports
arrays and objects |
Only
text-based data |
|
Readability |
Easy
for humans |
Harder
to read |
|
Parsing |
Faster |
Slower |
|
Use
Case |
APIs,
web applications |
Documents,
configurations |
- JSON
and XML are formats for
data storage and exchange.
- JSON
is lightweight and widely used in web
APIs.
- XML
is more structured and descriptive,
useful for hierarchical data.
- Python provides built-in
modules — json and xml.etree.ElementTree — to easily work with both.
NumPy
with Python
Introduction
to NumPy
- NumPy
stands for Numerical Python.
- It is a powerful library used for numerical and scientific computing.
- It provides support for multidimensional arrays, mathematical operations, and linear algebra.
- Widely used in Data Science, Machine Learning, and Scientific Applications.
Why
Use NumPy?
Python
lists are slow and inefficient for numerical operations.
NumPy arrays are:
- Faster
and more memory-efficient
- Allow vectorized operations (no need
for loops)
- Integrated with many
scientific and ML libraries (Pandas, Scikit-learn, TensorFlow)
Installing
NumPy
Before
using NumPy, install it using:
pip
install numpy
Then
import it in Python:
import
numpy as np
NumPy
Arrays
The
core of NumPy is the ndarray
(N-dimensional array) object.
Creating Arrays
import
numpy as np
# From
list
arr =
np.array([1, 2, 3, 4, 5])
print(arr)
#
Multi-dimensional array
matrix
= np.array([[1, 2, 3], [4, 5, 6]])
print(matrix)
Array
Attributes
|
Attribute |
Description |
Example |
|
ndim |
Number
of dimensions |
arr.ndim |
|
shape |
Number
of rows and columns |
arr.shape |
|
size |
Total
number of elements |
arr.size |
|
dtype |
Data
type of elements |
arr.dtype |
Example:
arr =
np.array([[1, 2, 3], [4, 5, 6]])
print(arr.ndim) # 2
print(arr.shape) # (2, 3)
print(arr.size) # 6
Creating
Arrays with Built-in Functions
|
Function |
Description |
Example |
|
np.zeros() |
Creates
array of zeros |
np.zeros((2,3)) |
|
np.ones() |
Creates
array of ones |
np.ones((2,3)) |
|
np.arange() |
Creates
array with range of values |
np.arange(0,10,2) |
|
np.linspace() |
Creates
evenly spaced values |
np.linspace(0,1,5) |
|
np.eye() |
Identity
matrix |
np.eye(3) |
|
np.random.rand() |
Random
values between 0 and 1 |
np.random.rand(2,3) |
Array
Indexing and Slicing
You
can access and modify array elements easily.
arr =
np.array([10, 20, 30, 40, 50])
print(arr[0]) # First element
print(arr[1:4]) # Slicing elements
arr[2]
= 100 # Modify element
print(arr)
For 2D
arrays:
matrix
= np.array([[1,2,3],[4,5,6],[7,8,9]])
print(matrix[1,2]) # Element at 2nd row, 3rd column
print(matrix[:,1]) # All rows, 2nd column
Array
Operations
NumPy
supports element-wise arithmetic
operations.
a =
np.array([1,2,3])
b =
np.array([4,5,6])
print(a
+ b) # [5,7,9]
print(a
- b) # [-3,-3,-3]
print(a
* b) # [4,10,18]
print(a
/ b) # [0.25,0.4,0.5]
Also supports:
- np.sum(a) – Sum of
elements
- np.mean(a) – Mean value
- np.max(a) / np.min(a) –
Max/Min element
- np.sqrt(a) – Square root
- np.dot(a, b) – Dot
product
Array Reshaping
arr =
np.arange(6)
print(arr.reshape(2,3)) # Reshape 1D → 2D
Combining
and Splitting Arrays
a = np.array([[1,2],[3,4]])
b =
np.array([[5,6]])
#
Vertical stacking
print(np.vstack((a,b)))
#
Horizontal stacking
print(np.hstack((a,b.T)))
Broadcasting
Allows
arithmetic between arrays of different
shapes.
a =
np.array([[1,2,3],[4,5,6]])
b = np.array([10,20,30])
print(a
+ b)
Mathematical
and Statistical Functions
|
Function |
Description |
|
np.mean(a) |
Average of elements |
|
np.median(a) |
Median value |
|
np.std(a) |
Standard deviation |
|
np.var(a) |
Variance |
|
np.sum(a) |
Sum of elements |
|
np.sqrt(a) |
Square root |
Example
Program
import
numpy as np
data =
np.array([[2, 4, 6], [1, 3, 5]])
print("Original
Array:\n", data)
print("Mean:",
np.mean(data))
print("Max:",
np.max(data))
print("Sum
of each column:", np.sum(data, axis=0))
Applications
of NumPy
- Data
Science – data manipulation and preprocessing
- Machine
Learning – matrix operations
- Image
Processing – pixel data manipulation
- Scientific
Computing – solving mathematical equations
- Statistics
& Probability – analyzing datasets
Summary
- NumPy
provides high-performance multi-dimensional
arrays.
- It replaces slow Python
lists with efficient numerical
computations.
- Essential for Data Science, Machine Learning, and AI.
Pandas in Python
Introduction
- Pandas
are a powerful and popular Python
library for data manipulation and analysis.
- It provides high-performance data structures
and data analysis tools.
- The name “Pandas” comes from “Panel Data”, a term used in
statistics.
Why
Pandas?
Pandas
make it easy to:
- Handle and analyze tabular data (like Excel or CSV
files).
- Perform data cleaning, filtering, grouping, and
aggregation.
- Integrate seamlessly with
NumPy, Matplotlib, and
Scikit-learn.
- Work with large datasets efficiently.
Installing
Pandas
pip
install pandas
Import
it in Python:
import
pandas as pd
Data
Structures in Pandas
Pandas
provide two main data structures:
|
Data
Structure |
Description |
Example |
|
Series |
1D
labeled array (like a column in Excel) |
pd.Series() |
|
DataFrame |
2D
labeled data (like a spreadsheet) |
pd.DataFrame() |
Pandas
Series
A Series is like a one-dimensional array
with labels (index).
import
pandas as pd
data =
pd.Series([10, 20, 30, 40])
print(data)
Output:
0 10
1 20
2 30
3 40
dtype:
int64
Custom
index:
data =
pd.Series([100, 200, 300], index=['a', 'b', 'c'])
print(data['b']) # Output: 200
Pandas
DataFrame
A DataFrame is a two-dimensional table
of data with rows and columns.
import
pandas as pd
data =
{
'Name': ['Madhu', 'Latha', 'Ravi'],
'Age': [22, 21, 23],
'Dept': ['CSE', 'ECE', 'IT']
}
df =
pd.DataFrame(data)
print(df)
Output:
Name
Age Dept
0 Madhu
22 CSE
1 Latha
21 ECE
2 Ravi
23 IT
Reading
and Writing Data
Pandas
can read and write data from different file formats.
|
File
Type |
Function
to Read |
Function
to Write |
|
CSV |
pd.read_csv() |
to_csv() |
|
Excel |
pd.read_excel() |
to_excel() |
|
JSON |
pd.read_json() |
to_json() |
|
SQL |
pd.read_sql() |
to_sql() |
Example:
df =
pd.read_csv('students.csv')
df.to_excel('students.xlsx',
index=False)
DataFrame
Operations
a) Viewing Data
df.head() #
First 5 rows
df.tail(3) # Last 3 rows
df.info() # Summary of DataFrame
df.describe() # Statistical summary
df.shape # (rows, columns)
b) Selecting Data
df['Name'] # Select single column
df[['Name','Age']] # Multiple columns
df.iloc[0] # Select by row index
df.loc[1,
'Name'] # Select specific cell
Filtering
and Conditional Selection
df[df['Age']
> 21]
df[(df['Dept']
== 'CSE') & (df['Age'] > 21)]
Adding
and Removing Columns
df['Marks']
= [85, 90, 88] # Add new column
df.drop('Dept',
axis=1, inplace=True) # Remove column
Handling
Missing Data
df.isnull() # Check for missing values
df.dropna() # Drop rows with null values
df.fillna(0) # Replace nulls with 0
Sorting
and Grouping Data
df.sort_values(by='Age',
ascending=False)
df.groupby('Dept')['Marks'].mean()
Merging,
Joining, and Concatenation
a) Merging
pd.merge(df1,
df2, on='ID')
b) Concatenation
pd.concat([df1,
df2])
Statistical
and Mathematical Operations
df['Age'].mean()
df['Marks'].max()
df['Marks'].sum()
df.corr() # Correlation matrix
Example
Program
import
pandas as pd
data =
{
'Student': ['A', 'B', 'C', 'D'],
'Marks': [85, 90, 78, 92],
'Department': ['CSE', 'IT', 'CSE', 'ECE']
}
df =
pd.DataFrame(data)
print("Data:\n",
df)
print("\nAverage
Marks:", df['Marks'].mean())
print("\nCSE
Students:\n", df[df['Department'] == 'CSE'])
Output:
Data:
Student
Marks Department
0 A
85 CSE
1 B
90 IT
2 C
78 CSE
3 D
92 ECE
Average
Marks: 86.25
CSE
Students:
Student
Marks Department
0 A
85 CSE
2 C
78 CSE
Applications
of Pandas
- Data
Cleaning and Preparation
- Statistical
Analysis
- Data
Visualization (with Matplotlib/Seaborn)
- Machine
Learning Preprocessing
- Financial
Data Analysis
Important points
- Pandas
are the backbone of Data Science
in Python.
- Provides easy handling of structured data.
- Supports file I/O, filtering, grouping, and
analytics.
- Works well with NumPy and Matplotlib.
In
short:
NumPy = Numerical computations
Pandas = Data handling and
analysis
Matplotlib/Seaborn = Data
visualization
Plotting Graphs Using Pandas in Python
Introduction
- Pandas
provide built-in data
visualization features using the Matplotlib library.
- It allows us to create
different types of plots and
charts directly from Series
or Data Frame objects.
- Helps in understanding patterns, trends, and
relationships in data visually.
Importing
Required Libraries
Before
plotting, import the necessary libraries:
import
pandas as pd
import
matplotlib.pyplot as plt
Note: If Matplotlib is not
installed, install it using: pip install matplotlib
Creating
a Simple DataFrame
Let’s
create some data first:
import
pandas as pd
data =
{
'Year': [2020, 2021, 2022, 2023, 2024],
'Sales': [200, 250, 300, 350, 400],
'Profit': [20, 25, 30, 28, 35]
}
df =
pd.DataFrame(data)
print(df)
Output:
Year
Sales Profit
0 2020
200 20
1 2021
250 25
2 2022
300 30
3 2023
350 28
4 2024
400 35
Line
Plot
A line plot is used to display data
changes over a period of time.
df.plot(x='Year',
y='Sales', kind='line', title='Yearly Sales', color='blue', marker='o')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.grid(True)
plt.show()
Explanation:
- kind='line' → line chart
- x and y define which
columns to use
- marker='o' → shows data
points on the line
Bar
Plot
Used
to compare categories or quantities.
df.plot(x='Year',
y='Profit', kind='bar', title='Yearly Profit', color='orange')
plt.xlabel('Year')
plt.ylabel('Profit')
plt.show()
Explanation:
- Each bar represents a
category (here, year).
- Useful for comparing
profits or counts.
Multiple
Line Plot
To
compare two columns in one graph:
df.plot(x='Year',
y=['Sales', 'Profit'], kind='line', marker='o')
plt.title('Sales
vs Profit over Years')
plt.xlabel('Year')
plt.ylabel('Values')
plt.show()
Explanation:
- Plots both columns on the
same graph.
- Helps to see the
relationship between sales and profit.
Histogram
Used
to display frequency distribution
of numerical data.
df['Sales'].plot(kind='hist',
bins=5, color='green', title='Sales Distribution')
plt.xlabel('Sales')
plt.show()
Explanation:
- bins → number of
intervals.
- Useful for analyzing data
spread or patterns.
Pie
Chart
Used
to show percentage or proportion
of categories.
df['Profit'].plot(kind='pie',
labels=df['Year'], autopct='%1.1f%%', startangle=90)
plt.title('Profit
Share by Year')
plt.ylabel('')
plt.show()
Explanation:
- autopct → shows
percentage values.
- startangle=90 → starts
chart from the top.
Scatter
Plot
Used
to show relationship between two
numeric variables.
df.plot(kind='scatter',
x='Sales', y='Profit', color='red', title='Sales vs Profit')
plt.xlabel('Sales')
plt.ylabel('Profit')
plt.show()
Explanation:
- Each point represents a
(Sales, Profit) pair.
- Helps identify trends or
correlations.
Box
Plot
Used
for statistical analysis (to
check data spread and outliers).
df[['Sales',
'Profit']].plot(kind='box', title='Sales and Profit Distribution')
plt.show()
Explanation:
- Shows median, quartiles,
and outliers.
- Useful for understanding
data variability.
Customizing
the Graphs
You
can enhance the appearance using Matplotlib
options:
plt.figure(figsize=(8,5))
df.plot(x='Year',
y='Sales', kind='line', color='purple', marker='o', linestyle='--')
plt.title('Customized
Sales Graph')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.grid(True)
plt.show()
Example:
Comparing Multiple Graphs
import
matplotlib.pyplot as plt
plt.figure(figsize=(10,6))
# Line
plot
plt.subplot(2,1,1)
plt.plot(df['Year'],
df['Sales'], marker='o', color='blue', label='Sales')
plt.plot(df['Year'],
df['Profit'], marker='s', color='red', label='Profit')
plt.title('Sales
and Profit Comparison')
plt.legend()
# Bar
plot
plt.subplot(2,1,2)
plt.bar(df['Year'],
df['Sales'], color='green')
plt.title('Sales
Growth')
plt.tight_layout()
plt.show()
Types
of Plots Supported by Pandas
|
Plot
Type |
Parameter |
Description |
|
Line
Plot |
'line' |
Default
plot type |
|
Bar Plot |
'bar' |
Vertical
bars |
|
Barh
Plot |
'barh' |
Horizontal
bars |
|
Histogram |
'hist' |
Data
distribution |
|
Box
Plot |
'box' |
Statistical
view |
|
Area
Plot |
'area' |
Filled
area under line |
|
Pie
Chart |
'pie' |
Category
proportions |
|
Scatter
Plot |
'scatter' |
Relation
between two variables |
- Pandas integrate with Matplotlib to make plotting simple and powerful.
- Useful for data visualization, trend analysis, and decision-making.
- Common plots include line, bar, scatter, pie, histogram, and
box plots.
- Helps engineers and
analysts visualize complex data
clearly and effectively.
**************************************
WEEK 5
List
of Experiments
1.
Python program to check whether a JSON string contains complex object or not.
2.
Python Program to demonstrate NumPy arrays creation using array () function.
3.
Python program to demonstrate use of ndim, shape, size, dtype.
4.
Python program to demonstrate basic slicing, integer and Boolean indexing.
5.
Python program to find min, max, sum, cumulative sum of array
6.
Create a dictionary with at least five keys and each key represent value as a
list where this list contains at least ten values and convert this dictionary
as a pandas data frame and explore the data through the data frame as follows:
a)
Apply head () function to the pandas data frame
b)
Perform various data selection operations on Data Frame
7.
Select any two columns from the above data frame, and observe the change in one
attribute with respect
Program 1: Python program to check whether a JSON string contains complex object or not.
Method I
CODE:
import json
# Sample JSON strings
json_string1 =
'{"name": "Maddy", "age": 22, "marks":
{"math": 90, "science": 85}}'
json_string2 =
'{"name": "Rahul", "age": 20, "city":
"Delhi"}'
def
has_complex_object(json_str):
try:
# Convert JSON string to Python object
(dictionary)
data = json.loads(json_str)
# Check for any complex (nested)
structure like dict or list
for value in data.values():
if isinstance(value, (dict, list)):
return True
return False
except json.JSONDecodeError:
print("Invalid JSON format!")
return None
# Test the function
print("JSON 1:",
has_complex_object(json_string1)) # True
→ contains nested dict
print("JSON 2:",
has_complex_object(json_string2)) # False
→ all values are simple
Output:
JSON 1: True
JSON 2: False
Explanation:
ü import
json
→ to work with JSON data in Python.
ü json.loads() →
converts JSON string into a Python dictionary.
ü The
program checks each value in the dictionary:
o
If any value is a list or another dictionary,
it’s a complex object.
ü Returns:
o
✅ True
→ if complex object found
o
❌ False
→ if all values are simple (string, number, etc.)
Method II
Complex
numbers are not directly supported (e.g., 3+5j).
So if
a JSON string contains something like a complex number, the standard json
module will raise an error.
But we
can detect whether a JSON string contains a complex object by:
Trying
to parse it with json.loads().
If it
fails, check if the data contains "j" (imaginary unit).
Or,
after parsing, scan values to see if any are complex-like.
Program: Detect Complex Object in JSON
String
CODE:
import json
def contains_complex(json_str):
try:
data = json.loads(json_str) # Try parsing JSON
# Recursively check if any value is a
complex number
def check_complex(obj):
if isinstance(obj, dict):
return any(check_complex(v) for
v in obj.values())
elif isinstance(obj, list):
return any(check_complex(i) for
i in obj)
elif isinstance(obj, str):
# Check if string looks like a
complex number (e.g., "3+4j")
try:
complex(obj) # Attempt conversion
return True
except ValueError:
return False
else:
return False
return check_complex(data)
except json.JSONDecodeError:
return False
# Example JSON strings
json1 = '{"name":
"Alice", "age": 25, "number": "3+4j"}'
json2 = '{"x": 10,
"y": 20}'
print("JSON 1 contains
complex:", contains_complex(json1))
# True
print("JSON 2 contains
complex:", contains_complex(json2))
# False
Output:
JSON 1 contains complex:
True
JSON 2 contains complex:
False
Explanation
json.loads(json_str) → Parses JSON into
Python dict/list.
Example: '{"a": 1}' →
{"a": 1} (Python dict).
Recursive function check_complex():
If object is a dict → check all values.
If object is a list → check all items.
If object is a string → try converting to
complex().
If conversion succeeds → it’s a
complex-like value.
Returns true if any string looks like
"a+bj", otherwise False.
If you really want to store complex numbers
in JSON, you need custom encoding (e.g., save as {"real": 3,
"imag": 4}).
Program 2: Python Program to demonstrate
NumPy arrays creation using array () function
NumPy
array() Function
What is numpy.array()?
The
array() function in NumPy is used to create an ndarray (N-dimensional array)
from:
Python
lists
Python
tuples
Nested
sequences (list of lists → matrix)
Syntax:
numpy.array(object,
dtype=None, copy=True, order='K', subok=False, ndmin=0)
Parameters:
object
→ input data (list, tuple, nested list, etc.)
dtype
→ specify data type (int32, float64, etc.)
copy →
if True, copy is created; if False, reference is used if possible
order
→ memory layout:
'C' =
row-major (C-style, default)
'F' =
column-major (Fortran-style)
subok
→ if True, subclasses are passed through
ndmin
→ minimum number of dimensions
#
Import the NumPy library
import
numpy as np
# 1 Create
a 1-D array (one-dimensional)
arr1
= np.array([10, 20, 30, 40, 50])
print("1-D
Array:")
print(arr1)
# 2 Create
a 2-D array (two-dimensional)
arr2
= np.array([[1, 2, 3], [4, 5, 6]])
print("\n2-D
Array:")
print(arr2)
# 3 Create
a 3-D array (three-dimensional)
arr3
= np.array([
[[1, 2], [3, 4]],
[[5, 6], [7, 8]]
])
print("\n3-D
Array:")
print(arr3)
# 4
Check type and dimension of arrays
print("\nType
of arr1:", type(arr1))
print("Dimension
of arr1:", arr1.ndim)
print("Dimension
of arr2:", arr2.ndim)
print("Dimension
of arr3:", arr3.ndim)
1-D
Array:
[10
20 30 40 50]
2-D
Array:
[[1
2 3]
[4 5 6]]
3-D
Array:
[[1
2]
[3 4]]
[[5 6]
[7 8]]]
Type
of arr1: <class 'numpy.ndarray'>
Dimension
of arr1: 1
Dimension
of arr2: 2
Dimension
of arr3: 3
Python Program: Demonstrating numpy.array()
CODE:
import numpy as np
# 1. Creating 1D array from list
arr1 = np.array([1, 2, 3, 4, 5])
print("1D Array:",
arr1)
# 2. Creating 2D array (Matrix)
from nested list
arr2 = np.array([[1, 2, 3], [4,
5, 6]])
print("\n2D Array:\n",
arr2)
# 3. Creating array from tuple
arr3 = np.array((10, 20, 30))
print("\nArray from
Tuple:", arr3)
# 4. Specifying dtype
arr4 = np.array([1, 2, 3],
dtype=float)
print("\nArray with dtype
float:", arr4)
# 5. Using ndmin (minimum
dimensions)
arr5 = np.array([1, 2, 3, 4],
ndmin=3)
print("\nArray with
ndmin=3:\n", arr5)
print("Shape of
arr5:", arr5.shape)
# 6. Copy parameter
list_data = [1, 2, 3]
arr6 = np.array(list_data,
copy=False)
print("\nOriginal
List:", list_data)
print("NumPy Array
(copy=False):", arr6)
# Modify list and check array
list_data[0] = 99
print("Modified
List:", list_data)
print("NumPy Array after
modifying list:", arr6) # Will it
change?
OUTPUT:
1D Array: [1 2 3 4 5]
2D Array:
[[1 2 3]
[4 5 6]]
Array from Tuple: [10 20 30]
Array with dtype float: [1.
2. 3.]
Array with ndmin=3:
[[[1 2 3 4]]]
Shape of arr5: (1, 1, 4)
Original List: [1, 2, 3]
NumPy Array (copy=False): [1
2 3]
Modified List: [99, 2, 3]
NumPy Array after modifying
list: [1 2 3]
Notice:
NumPy
didn’t update arr6 when the list was modified — because by default, NumPy tries
to copy data into its own memory-efficient format.
np.array()
converts Python lists/tuples into NumPy ndarrays.
Supports
dtype conversion, multi-dimensional arrays, and custom memory layouts.
Very
efficient compared to Python lists (uses less memory, faster).
Program 3: Python program to demonstrate
use of ndim, shape, size, dtype.v
ndim
→ Number of dimensions (axes) of the array.
shape
→ Tuple of array dimensions (rows, cols, etc.).
size
→ Total number of elements in the array.
dtype
→ Data type of array elements (int32, float64, etc.).
Method I
CODE:
#
Import NumPy library
import
numpy as np
#
Create a 2D NumPy array
arr
= np.array([[10, 20, 30], [40, 50, 60]])
#
Display the array
print("Array:")
print(arr)
# 1 Number
of dimensions
print("\nNumber
of Dimensions (ndim):", arr.ndim)
# 2 Shape
of the array (rows, columns)
print("Shape
of Array (shape):", arr.shape)
# 3 Total
number of elements in the array
print("Size
of Array (size):", arr.size)
# 4 Data
type of elements stored in array
print("Data
Type of Elements (dtype):", arr.dtype)
OUTPUT:
Array:
[[10
20 30]
[40 50 60]]
Number
of Dimensions (ndim): 2
Shape
of Array (shape): (2, 3)
Size
of Array (size): 6
Data
Type of Elements (dtype): int64
Explanation
|
Attribute |
Meaning |
Example Output |
|
ndim |
Number of dimensions of array |
2 (since it’s 2D) |
|
shape |
Tuple showing rows and columns |
(2, 3) → 2 rows, 3 columns |
|
size |
Total number of elements |
6 |
|
dtype |
Data type of array elements |
int64 or int32 (depends on your system) |
Method II
Python Program: Demonstrating ndim, shape, size,
dtype
import numpy as np
# 1D Array
arr1 = np.array([10, 20, 30,
40])
print("Array 1:",
arr1)
print("ndim:",
arr1.ndim) # number of dimensions
print("shape:",
arr1.shape) # (4,) → 1 row, 4 columns
print("size:",
arr1.size) # total elements
print("dtype:",
arr1.dtype) # data type
print("-" * 50)
# 2D Array
arr2 = np.array([[1, 2, 3], [4,
5, 6]])
print("Array 2:\n",
arr2)
print("ndim:",
arr2.ndim) # 2D (matrix)
print("shape:",
arr2.shape) # (2,3) → 2 rows, 3 columns
print("size:", arr2.size) # 6 elements
print("dtype:",
arr2.dtype)
print("-" * 50)
# 3D Array
arr3 = np.array([[[1, 2], [3,
4]], [[5, 6], [7, 8]]])
print("Array 3:\n",
arr3)
print("ndim:",
arr3.ndim) # 3D array
print("shape:",
arr3.shape) # (2,2,2) → 2 blocks, 2
rows, 2 cols
print("size:",
arr3.size) # 8 elements
print("dtype:",
arr3.dtype)
OUTPUT:
Array 1: [10 20 30 40]
ndim: 1
shape: (4,)
size: 4
dtype: int64
--------------------------------------------------
Array 2:
[[1 2 3]
[4 5 6]]
ndim: 2
shape: (2, 3)
size: 6
dtype: int64
--------------------------------------------------
Array 3:
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
ndim: 3
shape: (2, 2, 2)
size: 8
dtype: int64
Explanation
ndim tells us if it’s 1D, 2D, or 3D.
arr1 → 1D
arr2 → 2D (matrix)
arr3 → 3D (cube/block).
shape gives dimensions:
arr1 → (4,) (4 elements, 1 row).
arr2 → (2,3) (2 rows × 3 columns).
arr3 → (2,2,2) (2 blocks × 2 rows × 2
columns).
size = total number of elements (product
of shape).
dtype = NumPy automatically chooses
efficient type (int64, float32, etc.).
Program 4: Python program to demonstrate
basic slicing, integer and Boolean indexing.
CODE:
# Import NumPy library
import numpy as np
# Create a 1D NumPy array
arr = np.array([10, 20, 30, 40,
50, 60, 70])
print("Original
Array:")
print(arr)
# 1 Basic Slicing
print("\n1. Basic Slicing
Examples:")
print("Elements from index
1 to 4:", arr[1:5]) # 20 to 50
print("Elements from start
to 3:", arr[:4]) # 10 to 40
print("Elements from index
3 to end:", arr[3:]) # 40 to 70
print("Every second
element:", arr[::2]) # 10,
30, 50, 70
# 2 Integer Indexing
print("\n2. Integer
Indexing Examples:")
indices = [0, 2, 5]
print("Elements at
positions 0, 2, 5:", arr[indices])
# 10, 30, 60
# 3 Boolean Indexing
print("\n3. Boolean
Indexing Examples:")
bool_mask = arr > 40
print("Boolean Mask (arr
> 40):", bool_mask)
print("Elements greater
than 40:", arr[bool_mask])
OUTPUT:
Original Array:
[10 20 30 40 50 60 70]
1. Basic Slicing Examples:
Elements from index 1 to 4:
[20 30 40 50]
Elements from start to 3: [10
20 30 40]
Elements from index 3 to end:
[40 50 60 70]
Every second element: [10 30
50 70]
2. Integer Indexing
Examples:
Elements at positions 0, 2, 5:
[10 30 60]
3. Boolean Indexing
Examples:
Boolean Mask (arr > 40):
[False False False False True True True]
Elements greater than 40: [50
60 70]
|
Concept |
Description |
Example |
|
Basic
Slicing |
Selects
continuous elements using start:end:step |
arr[1:5]
→ 20 30 40 50 |
|
Integer
Indexing |
Selects
elements at specific positions |
arr[[0,
2, 5]] → 10 30 60 |
|
Boolean
Indexing |
Uses
True/False array to filter elements |
arr[arr
> 40] → 50 60 70 |
import numpy as np
# Create a NumPy array
arr = np.array([10, 20, 30, 40,
50, 60, 70])
print("Original
Array:")
print(arr)
# ---------------------------
# 1.Basic Slicing
# ---------------------------
# Get elements from index 2 to 5
(5 excluded)
slice1 = arr[2:5]
print("\nBasic Slicing
arr[2:5]:", slice1)
# Get every 2nd element
slice2 = arr[::2]
print("Basic Slicing
arr[::2] (every 2nd element):", slice2)
# ---------------------------
# 2.Integer Indexing
# ---------------------------
# Access multiple elements using
a list of indices
indices = [1, 3, 5]
int_indexed = arr[indices]
print("\nInteger Indexing
arr[[1,3,5]]:", int_indexed)
# ---------------------------
# 3.Boolean Indexing
# ---------------------------
# Create a Boolean condition
bool_indexed = arr[arr >
30] # all elements greater than 30
print("\nBoolean Indexing
arr[arr > 30]:", bool_indexed)
OUTPUT:
less
Copy code
Original Array:
[10 20 30 40 50 60 70]
Basic Slicing arr[2:5]: [30
40 50]
Basic Slicing arr[::2]
(every 2nd element): [10 30 50 70]
Integer Indexing
arr[[1,3,5]]: [20 40 60]
Boolean Indexing arr[arr
> 30]: [40 50 60 70]
Explanation
1.
Basic Slicing
o arr[start:end]
→ selects elements from start to end-1.
o arr[start:end:step]
→ selects elements with a step size.
2.
Integer Indexing
o You
can pass a list of indices to access multiple elements at once.
o Example:
arr[[1,3,5]] selects 2nd, 4th, and 6th elements.
3.
Boolean Indexing
o You
can create a condition that returns a Boolean array, and use it to
filter elements.
o Example:
arr[arr > 30] selects all elements greater than 30.
Program 5: Python program to find min, max, sum, cumulative sum of array
CODE:
import numpy as np
# Create a NumPy array
arr = np.array([10, 20, 30, 40,
50])
print("Original
Array:")
print(arr)
# Minimum value
min_val = np.min(arr)
print("\nMinimum
value:", min_val)
# Maximum value
max_val = np.max(arr)
print("Maximum
value:", max_val)
# Sum of all elements
sum_val = np.sum(arr)
print("Sum of
elements:", sum_val)
# Cumulative sum
cum_sum = np.cumsum(arr)
print("Cumulative
sum:", cum_sum)
OUTPUT:
yaml
Copy code
Original Array:
[10 20 30 40 50]
Minimum value: 10
Maximum value: 50
Sum of elements: 150
Cumulative sum: [ 10 30 60
100 150]
Explanation
np.min(arr)
→ Returns the smallest element in the array.
np.max(arr)
→ Returns the largest element in the array.
np.sum(arr)
→ Returns the sum of all elements.
np.cumsum(arr)
→ Returns the cumulative sum, i.e., running total of elements.
Program 6: Create a dictionary with at least five keys and each key represent value as a
list where this list contains at least ten
values and convert this dictionary as a
pandas data frame and explore the data
through the data frame as follows:
a)
Apply
head () function to the pandas data frame
b)
Perform
various data selection operations on Data Frame
(a) Apply head () function to the pandas data
frame
CODE:
# Import pandas library
import pandas as pd
# 1Create a dictionary with 5
keys and 10 values each
student_data = {
'Name': ['Asha', 'Ravi', 'Kiran', 'Maya',
'John', 'Lina', 'Raj', 'Sara', 'Tom', 'Anu'],
'Age': [18, 19, 20, 18, 21, 22, 19, 20, 18,
21],
'Marks_Math': [78, 85, 92, 67, 88, 90, 76,
82, 95, 80],
'Marks_Science': [82, 79, 88, 91, 73, 85,
89, 77, 94, 80],
'City': ['Delhi', 'Mumbai', 'Chennai',
'Kolkata', 'Delhi', 'Pune', 'Hyderabad', 'Bangalore', 'Kochi', 'Jaipur']
}
# 2 Convert dictionary to a pandas
DataFrame
df = pd.DataFrame(student_data)
# 3 Display the complete DataFrame
print("Complete
DataFrame:")
print(df)
# 4 Apply head() function to display
first 5 rows
print("\nFirst 5 Rows using
head():")
print(df.head())
OUTPUT:
Complete DataFrame:
Name
Age Marks_Math Marks_Science City
0 Asha
18 78 82 Delhi
1 Ravi
19 85 79 Mumbai
2 Kiran
20 92 88 Chennai
3 Maya
18 67 91 Kolkata
4 John
21 88 73 Delhi
5 Lina
22 90 85 Pune
6 Raj
19 76 89
Hyderabad
7 Sara
20 82 77
Bangalore
8 Tom
18 95 94 Kochi
9 Anu
21 80 80 Jaipur
First 5 Rows using head():
Name
Age Marks_Math Marks_Science City
0 Asha
18 78 82 Delhi
1 Ravi
19 85 79 Mumbai
2 Kiran
20 92 88
Chennai
3 Maya
18 67 91
Kolkata
4 John
21 88 73 Delhi
(b) Perform
various data selection operations on Data Frame
CODE:
# Import pandas library
import pandas as pd
# 1 Create a dictionary with 5 keys and 10 values each
student_data = {
'Name': ['Asha', 'Ravi', 'Kiran', 'Maya', 'John', 'Lina',
'Raj', 'Sara', 'Tom', 'Anu'],
'Age': [18, 19, 20, 18, 21, 22, 19, 20, 18, 21],
'Marks_Math': [78, 85, 92, 67, 88, 90, 76, 82, 95, 80],
'Marks_Science': [82, 79, 88, 91, 73, 85, 89, 77, 94, 80],
'City': ['Delhi', 'Mumbai', 'Chennai', 'Kolkata', 'Delhi',
'Pune', 'Hyderabad', 'Bangalore', 'Kochi', 'Jaipur']
}
# 2 Convert dictionary into a pandas DataFrame
df = pd.DataFrame(student_data)
# Display the complete DataFrame
print("Complete DataFrame:")
print(df)
# 3 Explore the data
print("\nFirst 5 rows using head():")
print(df.head())
# 4 Perform various Data Selection Operations
# a) Select a single column
print("\n(a) Selecting a single column (Marks_Math):")
print(df['Marks_Math'])
# b) Select multiple columns
print("\n(b) Selecting multiple columns (Name, City,
Marks_Science):")
print(df[['Name', 'City', 'Marks_Science']])
# c) Select a specific row using loc (by label)
print("\n(c) Selecting a specific row using loc (row index
2):")
print(df.loc[2])
# d) Select a specific row using iloc (by position)
print("\n(d) Selecting a specific row using iloc (row position
4):")
print(df.iloc[4])
OUTPUT:
Complete DataFrame:
Name Age
Marks_Math Marks_Science City
0 Asha 18
78 82 Delhi
1 Ravi 19
85 79 Mumbai
2 Kiran 20
92 88 Chennai
3 Maya 18
67 91
Kolkata
4 John 21
88 73 Delhi
5 Lina 22
90 85 Pune
6 Raj 19
76 89 Hyderabad
7 Sara 20
82 77 Bangalore
8 Tom 18
95 94 Kochi
9 Anu 21
80 80 Jaipur
First 5 rows using head():
Name Age
Marks_Math Marks_Science City
0 Asha 18
78 82 Delhi
1 Ravi 19
85 79 Mumbai
2 Kiran 20
92 88 Chennai
3 Maya 18
67 91 Kolkata
4 John 21
88 73 Delhi
(a) Selecting a single column (Marks_Math):
0 78
1 85
2 92
3 67
4 88
5 90
6 76
7 82
8 95
9 80
Name: Marks_Math, dtype: int64
(b) Selecting multiple columns (Name, City, Marks_Science):
Name City
Marks_Science
0 Asha Delhi 82
1 Ravi Mumbai 79
2 Kiran Chennai 88
3 Maya
Kolkata 91
4 John Delhi 73
5 Lina Pune 85
6 Raj Hyderabad 89
7 Sara Bangalore 77
8 Tom Kochi 94
9 Anu Jaipur 80
(c) Selecting a specific row using loc (row index 2):
Name Kiran
Age 20
Marks_Math 92
Marks_Science 88
City Chennai
Name: 2, dtype: object
(d) Selecting a specific row using iloc (row position 4):
Name John
Age 21
Marks_Math 88
Marks_Science 73
City Delhi
Name: 4, dtype: object
in one attribute with respect to other
attribute with scatter and plot operations
in matplotlib
CODE:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
# 1 Create a dictionary with
sample student data
student_data = {
'Name': ['Asha', 'Ravi',
'Kiran', 'Maya', 'John', 'Lina', 'Raj', 'Sara', 'Tom', 'Anu'],
'Age': [18, 19, 20,
18, 21, 22, 19, 20, 18, 21],
'Marks_Math': [78,
85, 92, 67, 88, 90, 76, 82, 95, 80],
'Marks_Science': [82,
79, 88, 91, 73, 85, 89, 77, 94, 80],
'City': ['Delhi', 'Mumbai',
'Chennai', 'Kolkata', 'Delhi', 'Pune', 'Hyderabad', 'Bangalore', 'Kochi', 'Jaipur']
}
# 2 Convert dictionary into a
pandas DataFrame
df = pd.DataFrame(student_data)
# Display the DataFrame
print("Student
DataFrame:")
print(df)
# 3 Select two columns for
visualization
x = df['Marks_Math']
y = df['Marks_Science']
# 4 Create a Scatter Plot
plt.scatter(x, y, color='blue',
marker='o')
plt.title("Scatter Plot:
Marks in Math vs Science")
plt.xlabel("Marks in
Math")
plt.ylabel("Marks in
Science")
plt.grid(True)
plt.show()
# 5 Create a Line Plot (Plot
Operation)
plt.plot(x, y, color='green',
linestyle='--', marker='o')
plt.title("Line Plot: Marks
in Math vs Science")
plt.xlabel("Marks in
Math")
plt.ylabel("Marks in
Science")
plt.grid(True)
plt.show()
OUTPUT:
Student DataFrame:
Name
Age Marks_Math Marks_Science City
0 Asha
18 78 82 Delhi
1 Ravi
19 85 79 Mumbai
2 Kiran
20 92 88 Chennai
3 Maya
18 67 91 Kolkata
4 John
21 88 73 Delhi
5 Lina
22 90
85 Pune
6 Raj
19 76 89
Hyderabad
7 Sara
20 82 77
Bangalore
8 Tom
18 95 94 Kochi
9 Anu
21 80 80 Jaipur
Explanation:
|
Step |
Function |
Description |
|
plt.scatter(x,
y) |
Creates
a scatter plot |
Shows
how one variable changes with another |
|
plt.plot(x,
y) |
Creates
a line plot |
Connects
data points with lines |
|
xlabel(),
ylabel() |
Label
axes |
Gives
context to the chart |
|
title() |
Adds
a title |
Describes
what the plot represents |
|
plt.show() |
Displays
the plot window |
Shows
the graph |