Techno Ladder: Python Important Interview questions for data science

General Python Questions

1. What are Python's key features?

Interpreted and dynamically typed

High-level and easy to read

Extensive standard library

Supports multiple programming paradigms (OOP, Functional, Procedural)

Cross-platform compatibility

Strong community support

2. Explain the difference between deepcopy() and copy().

copy.copy() creates a shallow copy, meaning changes to mutable objects inside the copied object will reflect in the original.

copy.deepcopy() creates a deep copy, meaning all objects are recursively copied, preventing unintended modifications.

3. How does Python manage memory?

Python uses automatic memory management with reference counting and garbage collection.

The garbage collector removes objects that are no longer referenced.

Memory is allocated in private heaps that are managed by the interpreter.

4. What is the difference between is and ==?

is checks object identity (i.e., whether two variables point to the same memory location).

== checks value equality (i.e., whether two variables have the same value).

5. How do Python lists and tuples differ?

Lists are mutable, meaning elements can be modified after creation.

Tuples are immutable, meaning elements cannot be changed.

Lists have more methods and consume slightly more memory than tuples.

---

Data Engineering-Specific Questions

6. How do you handle large datasets efficiently in Python?

Use Dask or Vaex instead of Pandas for parallel computing.

Use chunking when reading large CSV files.

Leverage generators instead of lists to save memory.

Store large datasets in Parquet format instead of CSV for efficiency.

7. What is the difference between Pandas and Dask?

Pandas: Best for small-to-medium datasets; operates in memory.

Dask: Supports parallel processing; can handle large datasets by breaking them into smaller parts.

8. How would you optimize reading a large CSV file in Pandas?

Use chunksize to read the file in smaller parts.

Specify data types (dtype parameter) to reduce memory usage.

Use PyArrow or Vaex for faster I/O operations.

9. How do you use Python for ETL (Extract, Transform, Load) pipelines?

Extract: Read data from sources like APIs, databases, or files (Pandas, SQLAlchemy).

Transform: Clean, filter, and reshape data (Pandas, Dask, PySpark).

Load: Write transformed data to storage (SQL, S3, Azure Blob, Kafka).

10. How would you handle missing data in Pandas?

Use .fillna() to replace missing values.

Use .dropna() to remove rows/columns with missing values.

Use interpolation or statistical methods like mean/median imputation.

---

Azure Cloud & Python Questions

11. How do you use Python to interact with Azure Key Vault?

from azure.identity import DefaultAzureCredential

from azure.keyvault.secrets import SecretClient

key_vault_url = "https://<your-keyvault-name>.vault.azure.net/"

credential = DefaultAzureCredential()

client = SecretClient(vault_url=key_vault_url, credential=credential)

secret = client.get_secret("my-secret-name")

print(secret.value)

12. How would you securely store and retrieve secrets in an Azure environment?

Use Azure Key Vault for secret management.

Authenticate using Managed Identity or Service Principal with RBAC.

Avoid storing secrets in environment variables or code.

13. How do you use Azure Blob Storage with Python?

from azure.storage.blob import BlobServiceClient

connection_string = "your_connection_string"

blob_service_client = BlobServiceClient.from_connection_string(connection_string)

container_client = blob_service_client.get_container_client("my-container")

for blob in container_client.list_blobs():

print(blob.name)

14. What is the role of azure-identity and azure-keyvault-secrets in authentication?

azure-identity: Provides authentication mechanisms like DefaultAzureCredential, Managed Identity, and Service Principal.

azure-keyvault-secrets: Provides secure access to Azure Key Vault secrets.

15. How do you use Python to query Azure SQL Database efficiently?

import pyodbc

conn = pyodbc.connect(

"DRIVER={ODBC Driver 17 for SQL Server};"

"SERVER=tcp:<your-server>.database.windows.net;"

"DATABASE=mydb;"

"UID=myuser;"

"PWD=mypassword"

)

cursor = conn.cursor()

cursor.execute("SELECT * FROM mytable")

rows = cursor.fetchall()

for row in rows:

print(row)

---

Splunk & Python

16. How do you query Splunk using Python?

import requests

url = "https://splunk-server:8089/services/search/jobs"

headers = {"Authorization": "Bearer YOUR_SPLUNK_TOKEN"}

data = {"search": "search index=main | head 10"}

response = requests.post(url, headers=headers, data=data)

print(response.json())

17. What is the difference between using REST API vs. SDK for querying Splunk?

REST API: Gives raw access to Splunk services via HTTP requests.

Splunk SDK (splunk-sdk-python): Provides Python-friendly functions and better integration with applications.

18. How do you authenticate Python scripts with Splunk securely?

Use OAuth tokens instead of storing credentials in code.

Implement environment variables or Azure Key Vault for secret management.

Use role-based access control (RBAC) in Splunk.

19. What are some common use cases for integrating Splunk with Python?

Log analysis: Automate log searching and filtering.

Alerting & monitoring: Trigger alerts based on log patterns.

Security & anomaly detection: Detect security incidents.

Data visualization: Export data to Pandas for analysis.

20. How do you filter and process Splunk logs using Pandas?

import pandas as pd

# Sample Splunk JSON response

logs = [

{"timestamp": "2025-02-19T10:00:00Z", "status": 200, "message": "OK"},

{"timestamp": "2025-02-19T10:05:00Z", "status": 500, "message": "Internal Server Error"},

]

df = pd.DataFrame(logs)

df["timestamp"] = pd.to_datetime(df["timestamp"])

errors = df[df["status"] >= 500]

print(errors)

---

Tuesday, February 18, 2025

Python Important Interview questions for data science

No comments:

Post a Comment