Techno Ladder: 2025

Monday, May 12, 2025

Create Splunk dashboard with data from excel file

1. Can Saved Searches Update an Excel File with Predefined Columns?

Not directly. Splunk saved searches export results in formats like:

CSV

JSON

XML

But you can:

Schedule a saved search to export as a CSV.

Use a script (Python, PowerShell, etc.) outside of Splunk to:

Watch the exported file.

Insert data into a predefined Excel template.

Save it as .xlsx.

Example:

Use the outputcsv command in your saved search:

your search query

| outputcsv my_output.csv

Then use Python (with pandas and openpyxl) to map this CSV into a predefined Excel template.

---

2. Can You Use Excel Files to Create Splunk Dashboards?

No, not directly. Splunk dashboards rely on:

Indexed data in Splunk

Real-time or scheduled searches

Lookups (CSV-based), not .xlsx

But you can:

Convert your Excel file to CSV

Upload it as a lookup table in Splunk

Use it in searches like:

| inputlookup my_excel_data.csv

From there, you can create dashboards.

Monday, May 5, 2025

How to optimise Splunk Queries

Optimizing Splunk queries is key to getting faster results, reducing system load, and improving the efficiency of your dashboards and alerts. Here's a practical guide to help you optimize your Splunk searches:

---

1. Filter Early, Filter Often

Narrow your time range as much as possible.

Use index=, sourcetype=, and source= as early as possible in the query:

index=security sourcetype=syslog error

---

2. Avoid Wildcards at the Beginning

Bad:

index=*security*

Good:

index=security

---

3. Use fields to Limit Output Columns

If you only need certain fields, extract them early:

index=web | fields host, status, uri_path

---

4. Use where Instead of search After the Pipe

where is more efficient for numerical or conditional filtering:

| where status=500 AND duration > 1000

---

5. Avoid Expensive Commands Early (like join, stats, lookup)

Push expensive commands as late as possible. Consider using stats instead of join when possible.

---

6. Replace join with stats

Instead of:

index=a | join user_id [ search index=b | fields user_id, role ]

Use:

(index=a OR index=b) | stats values(role) as role by user_id

---

7. Use tstats for Data Models

| tstats is faster than raw searches for data models:

| tstats count from datamodel=Web.Web by Web.src, Web.dest

---

8. Schedule Reports and Use Summary Indexes

For repeated heavy queries, schedule them and store the results in a summary index to reduce runtime in dashboards.

---

9. Avoid Subsearches with Large Output

Subsearches should return <50,000 results; otherwise, they can slow down or fail.

---

10. Use eventstats Instead of stats if You Need to Keep Raw Events

| eventstats avg(duration) as avg_duration

Monday, April 21, 2025

Advanced Python Questions

1. What are Python decorators and how do they work?

Answer: Decorators are functions that modify the behavior of other functions or methods. They are used with @decorator_name syntax.

def decorator(func):

def wrapper():

print("Before function call")

func()

print("After function call")

return wrapper

@decorator

def say_hello():

print("Hello!")

say_hello()

---

2. What is the difference between is and == in Python?

Answer:

is checks object identity (whether two variables point to the same object in memory).

== checks value equality (whether the values are the same).

a = [1, 2]

b = [1, 2]

print(a == b) # True

print(a is b) # False

---

3. What is a generator? How is it different from a list?

Answer:

Generators yield items one at a time using yield. They are memory-efficient and lazy-evaluated.

def gen():

yield 1

yield 2

g = gen()

print(next(g)) # 1

Generators don’t store the whole list in memory, unlike a normal list.

---

4. Explain Python's GIL (Global Interpreter Lock).

Answer:

The GIL is a mutex in CPython that allows only one thread to execute at a time, even on multi-core systems. This affects multi-threaded CPU-bound programs but not I/O-bound ones.

---

5. What are metaclasses in Python?

Answer:

Metaclasses are classes of classes — they define how classes behave. You can control class creation using metaclasses.

class Meta(type):

def __new__(cls, name, bases, dct):

print(f"Creating class {name}")

return super().__new__(cls, name, bases, dct)

class MyClass(metaclass=Meta):

pass

---

6. How does *args and **kwargs work?

Answer:

*args captures positional arguments as a tuple.

**kwargs captures keyword arguments as a dict.

def func(*args, **kwargs):

print(args)

print(kwargs)

func(1, 2, a=3, b=4)

---

7. What are closures in Python?

Answer:

Closures are functions that remember the values from their enclosing scope even if the outer function has finished executing.

def outer(x):

def inner():

print(x)

return inner

closure = outer(10)

closure() #

---

8. What is monkey patching in Python?

Answer:

Changing or extending code at runtime, typically used in testing.

import math

math.sqrt = lambda x: "patched"

print(math.sqrt(9)) # patched

9. How does Python’s memory management work?

Answer:

Python uses reference counting and a garbage collector for cyclic references.

Memory is managed in private heaps.

10. Difference between shallow and deep copy?

Answer:

Shallow copy: copies only references for nested objects.

Deep copy: copies all levels recursively.

import copy

a = [[1, 2]]

shallow = copy.copy(a)

deep = copy.deepcopy(a)

Would you like these as a downloadable PDF or want more questions (e.g., multithreading, asyncio, design patterns, etc.)?

Monday, March 31, 2025

Dynatrace Important Concepts

Dynatrace is an advanced observability and application performance monitoring (APM) platform that provides deep insights into cloud, hybrid, and on-premise environments. Here are the most important concepts in Dynatrace:

1. OneAgent

A lightweight agent that collects performance and dependency data from applications, hosts, and infrastructure.

Installed on monitored systems to provide full-stack observability.

2. Smartscape Topology

A real-time dependency map that shows relationships between applications, services, processes, and hosts.

Helps visualize how components interact within your environment.

3. Davis AI (Anomaly Detection)

An AI-powered engine that automatically detects anomalies, root causes, and performance issues.

Reduces alert noise by correlating multiple issues into meaningful incidents.

4. PurePath (Distributed Tracing)

Provides deep transaction-level insights by capturing end-to-end traces of requests across distributed systems.

Helps diagnose slow transactions and code-level issues.

5. Session Replay

Captures user interactions on web and mobile applications for performance analysis and UX improvement.

Useful for debugging frontend issues and enhancing user experience.

6. Dynatrace Managed vs. SaaS

Dynatrace SaaS: Cloud-based solution managed by Dynatrace.

Dynatrace Managed: Self-hosted version for organizations that need full control over data and security.

7. Real User Monitoring (RUM)

Tracks real user behavior and experience across web and mobile applications.

Measures performance metrics like page load times, user actions, and conversion rates.

8. Synthetic Monitoring

Simulates user interactions with applications to detect availability and performance issues before they impact users.

Useful for proactive monitoring of APIs, web applications, and third-party dependencies.

9. Log Monitoring

Collects, indexes, and analyzes logs for real-time troubleshooting and anomaly detection.

Helps correlate log data with application and infrastructure performance.

10. Infrastructure Monitoring

Monitors servers, containers, cloud services, and network resources.

Provides deep insights into CPU, memory, disk, and network usage.

11. Kubernetes & Cloud Monitoring

Monitors Kubernetes clusters, pods, and microservices in cloud-native environments.

Integrates with AWS, Azure, and Google Cloud for full cloud observability.

12. Service Level Objectives (SLOs)

Allows setting and tracking of performance and reliability goals.

Helps organizations meet business SLAs (Service Level Agreements).

13. Business Analytics (BizOps)

Combines performance monitoring with business metrics to provide insights into revenue impact.

Helps optimize digital business operations.

14. Security Monitoring (Application Security)

Detects vulnerabilities and security threats in real time.

Integrates with DevSecOps workflows to ensure secure deployments.

15. API & Custom Metrics

Allows integration with third-party tools via REST APIs.

Enables custom metric ingestion for tailored observability.

Tuesday, March 18, 2025

Advanced Operations with pd.Series in Pandas

1. Filtering Values in a Series

You can filter values based on conditions.

# Get values greater than 20

print(data[data > 20])

Output:

c 30

d 40

dtype: int64

---

2. Performing Mathematical Operations

You can apply mathematical operations on a Series.

# Multiply all values by 2

print(data * 2)

Output:

a 20

b 40

c 60

d 80

dtype: int64

---

3. Applying Functions Using apply()

You can apply custom functions to modify values.

print(data.apply(lambda x: x ** 2)) # Square each value

Output:

a 100

b 400

c 900

d 1600

dtype: int64

---

4. Checking for Missing (NaN) Values

data_with_nan = pd.Series([10, 20, None, 40], index=['a', 'b', 'c', 'd'])

# Check for missing values

print(data_with_nan.isna())

Output:

a False

b False

c True

d False

dtype: bool

To fill missing values:

print(data_with_nan.fillna(0)) # Replace NaN with 0

---

5. Using map() for Element-wise Mapping

# Convert values to strings with a prefix

print(data.map(lambda x: f"Value: {x}"))

Output:

a Value: 10

b Value: 20

c Value: 30

d Value: 40

dtype: object

---

6. Vectorized Operations (Element-wise)

You can perform vectorized operations efficiently.

# Log transform (requires numpy)

import numpy as np

print(np.log(data))

---

7. Sorting a Series

# Sort by values

print(data.sort_values(ascending=False))

# Sort by index

print(data.sort_index())

---

8. Checking for Membership

print('b' in data) # Output: True

---

9. Converting Series to Other Data Types

# Convert to a list

print(data.tolist())

# Convert to a dictionary

print(data.to_dict())

Tuesday, March 4, 2025

Scraping Xymon for Timestamp & Specific Keywords and Sending Logs to Splunk

1. Install Required Python Libraries

pip install requests beautifulsoup4 splunk-sdk

2. Python Script to Extract Timestamp & File Names and Send to Splunk

import requests

from bs4 import BeautifulSoup

import json

import splunklib.client as client

import splunklib.helpers as helpers

import logging

import re

# Setup logging

logging.basicConfig(filename="xymon_scraper.log", level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

# Xymon Web Interface URL

XYMON_URL = "http://<xymon-server>/xymon-cgi/bb-hostsvc.sh?HOST=all"

# Xymon Authentication (if required)

XYMON_USERNAME = "your_xymon_user"

XYMON_PASSWORD = "your_xymon_password"

# Splunk Credentials

SPLUNK_HOST = "your-splunk-server"

SPLUNK_PORT = 8089

SPLUNK_USERNAME = "admin"

SPLUNK_PASSWORD = "yourpassword"

SPLUNK_INDEX = "xymon_logs"

# Define filename pattern to extract (modify as needed)

FILENAME_PATTERN = r"([a-zA-Z0-9_-]+\.log)"

# Function to fetch Xymon data

def fetch_xymon_data():

try:

session = requests.Session()

auth = (XYMON_USERNAME, XYMON_PASSWORD) if XYMON_USERNAME else None

response = session.get(XYMON_URL, auth=auth, timeout=10)

if response.status_code == 200:

logging.info("Successfully fetched Xymon data")

return response.text, response.headers.get("Date")

else:

logging.error(f"Failed to fetch Xymon data. Status code: {response.status_code}")

return None, None

except Exception as e:

logging.error(f"Error fetching Xymon data: {str(e)}")

return None, None

# Function to extract timestamps & specific filenames from Xymon

def parse_xymon_data(html_data, timestamp):

soup = BeautifulSoup(html_data, "html.parser")

logs = []

for link in soup.find_all("a"):

service_text = link.text.strip()

# Extract file names based on pattern

filename_match = re.search(FILENAME_PATTERN, service_text)

if filename_match:

log_entry = {

"filename": filename_match.group(0),

"timestamp": timestamp

}

logs.append(log_entry)

logging.info(f"Extracted {len(logs)} logs with filenames from Xymon")

return logs

# Function to send logs to Splunk

def send_to_splunk(logs):

try:

service = client.connect(

host=SPLUNK_HOST,

port=SPLUNK_PORT,

username=SPLUNK_USERNAME,

password=SPLUNK_PASSWORD

)

for log in logs:

event = json.dumps(log)

helpers.send_data(service, event, host=SPLUNK_HOST, index=SPLUNK_INDEX)

logging.info(f"Successfully sent {len(logs)} logs to Splunk")

except Exception as e:

logging.error(f"Error sending logs to Splunk: {str(e)}")

# Main function

def main():

html_data, timestamp = fetch_xymon_data()

if html_data and timestamp:

logs = parse_xymon_data(html_data, timestamp)

if logs:

send_to_splunk(logs)

else:

logging.warning("No relevant logs extracted from Xymon")

else:

logging.warning("No data fetched from Xymon")

if __name__ == "__main__":

main()

Big Brother Server in the context of xymon

In the context of the Xymon monitoring tool, the Big Brother Server refers to the central monitoring server that collects and displays status updates from monitored systems. Xymon itself is a fork of the Big Brother monitoring system, which was one of the earliest network and system monitoring tools.

Role of the Big Brother Server in Xymon

1. Data Collection:

The server receives health and performance data from Xymon clients installed on different machines.

2. Status Processing:

It processes incoming status messages and logs events for alerting or reporting.

3. Web-Based Dashboard:

It provides a web-based interface displaying real-time system statuses using color-coded indicators (green, yellow, red, etc.).

4. Alerting System:

It can send notifications via email, SMS, or other methods when a system is experiencing issues.

5. Historical Data & Trends:

The Big Brother server stores historical data to analyze trends and detect anomalies.

Connection to Big Brother (BB) Tool

Xymon evolved from the Big Brother monitoring system, which had a similar architecture. The term Big Brother Server was used in Big Brother and carried over to Xymon to describe the central server managing the monitoring process.

Xymon Monitoring System - Overview & Setup Guide

1. Understanding Xymon Architecture

Xymon consists of three main components:

1. Xymon Server (Big Brother Server) – The central monitoring system that collects and displays data.

2. Xymon Clients – Agents installed on monitored machines to send health and performance data.

3. Web Interface – A dashboard that provides a color-coded status overview.

2. Setting Up Xymon

A. Install Xymon Server (on Linux)

1. Update System Packages

sudo apt update && sudo apt upgrade -y # For Debian/Ubuntu

sudo yum update -y # For RHEL/CentOS

2. Install Required Dependencies

sudo apt install -y xymon apache2 rrdtool librrd-dev libpcre3-dev libssl-dev

3. Download and Install Xymon

wget https://sourceforge.net/projects/xymon/files/latest/download -O xymon.tar.gz

tar -xzf xymon.tar.gz

cd xymon-*

./configure --prefix=/opt/xymon

make && sudo make install

4. Start Xymon Server

sudo /opt/xymon/server/bin/xymon.sh start

5. Access Web Dashboard

Open a browser and go to http://<server-ip>/xymon

B. Install Xymon Client (on Monitored Machines)

1. Install Required Packages

sudo apt install -y xymon-client

2. Configure Client to Send Data to Xymon Server

Edit the configuration file:

sudo nano /etc/default/xymon-client

Set the XYMONSERVERS variable to point to the Xymon server’s IP:

XYMONSERVERS="192.168.1.100"

3. Restart the Client

sudo systemctl restart xymon-client

3. Understanding the Web Dashboard

Green = OK

Yellow = Warning

Red = Critical

Blue = Test disabled

Purple = No report received

Wednesday, February 26, 2025

How to do blue green deployment in aro cluster?

Blue-green deployment in an Azure Red Hat OpenShift (ARO) cluster involves deploying two versions of your application in parallel and switching traffic between them, similar to the concept of blue-green deployments in other environments. Here's how you can implement a blue-green deployment in an ARO cluster:

Steps to Implement Blue-Green Deployment in ARO:

1. Set Up Two Application Environments (Blue and Green)

Blue Environment: This is the currently running production environment.

Green Environment: This will host the new version of the application.

In OpenShift, these environments can be represented by separate namespaces, separate deployment configurations, or different services within the same namespace.

Deploy the current (blue) version of your app using a DeploymentConfig or Deployment object.

Deploy the new (green) version in parallel with a separate configuration.

Example of deploying an app version:

apiVersion: apps/v1

kind: Deployment

metadata:

spec:

replicas: 2

selector:

matchLabels:

app: my-app

template:

metadata:

labels:

app: my-app

version: blue

spec:

containers:

- name: my-app

image: <blue-app-image>

ports:

- containerPort: 8080

For the green environment:

apiVersion: apps/v1

kind: Deployment

metadata:

spec:

replicas: 2

selector:

matchLabels:

app: my-app

template:

metadata:

labels:

app: my-app

version: green

spec:

containers:

- name: my-app

image: <green-app-image>

ports:

- containerPort: 8080

2. Expose the Services for Blue and Green

Create separate Service objects for both the blue and green deployments so that they can be independently accessed.

Example of services:

apiVersion: v1

kind: Service

metadata:

spec:

selector:

app: my-app

version: blue

ports:

- protocol: TCP

port: 80

targetPort: 8080

apiVersion: v1

kind: Service

metadata:

spec:

selector:

app: my-app

version: green

ports:

- protocol: TCP

port: 80

targetPort: 8080

3. Set Up a Route or Load Balancer

OpenShift uses Routes to expose services externally. In a blue-green setup, you'll create a route to point to the active environment.

Initially, the route will point to the blue-service.

Example:

apiVersion: route.openshift.io/v1

kind: Route

metadata:

spec:

host: my-app.example.com

to:

kind: Service

port:

targetPort: 8080

4. Testing the Green Environment

Before switching production traffic to the green environment, thoroughly test it. You can expose the green environment temporarily for testing by creating a separate route or using internal tools.

5. Switching Traffic to Green (Cutover)

Once the green environment is fully tested and validated, you can update the Route to direct traffic to the green-service. This will route all new traffic to the green deployment.

You can either modify the existing route or create a new one, as shown below:

apiVersion: route.openshift.io/v1

kind: Route

metadata:

spec:

host: my-app.example.com

to:

kind: Service

port:

targetPort: 8080

Now, traffic will be routed to the green environment.

6. Monitoring and Rollback

After switching traffic, closely monitor the application to ensure the green version is stable. If any issues arise, you can quickly rollback by switching the route back to the blue-service.

Example rollback:

apiVersion: route.openshift.io/v1

kind: Route

metadata:

spec:

host: my-app.example.com

to:

kind: Service

port:

targetPort: 8080

7. Decommission the Blue Environment

Once you're confident that the green environment is stable, you can scale down or remove the blue environment to save resources.

Example:

oc scale deployment app-blue --replicas=0

Additional Tools for Automation:

OpenShift Pipelines (based on Tekton) can automate the blue-green deployment process.

CI/CD tools like Jenkins or GitHub Actions integrated with OpenShift can streamline deployments and rollbacks.

Key Considerations:

Traffic Splitting: If you want to gradually route traffic between blue and green environments, you can use an advanced traffic management tool like Istio or an external load balancer like Azure Traffic Manager.

Monitoring: Use built-in OpenShift monitoring (Prometheus, Grafana) or Azure Monitor for observing application performance.

This approach ensures minimal downtime during deployment, and quick rollback capabilities.

Does blue green deployment need two regions in aro?

No, blue-green deployment in Azure Red Hat OpenShift (ARO) does not require two separate regions. It typically involves having two separate environments (or versions) within the same cluster or namespace. Blue-green deployments work by maintaining two versions of an application—one being the active (blue) and the other being the idle or testing version (green).

In the context of ARO, this can be achieved using separate namespaces, deployments, or even routes within the same cluster and region. Here’s how blue-green deployments work in ARO:

1. Single Cluster, Single Namespace: You can deploy both the blue (current) and green (new) versions of the application within the same namespace. This is done using separate deployments and services for each version.

2. Single Cluster, Separate Namespaces: Use two separate namespaces for each version of the application within the same ARO cluster. This keeps resources more isolated.

3. Traffic Routing and Switching: With OpenShift's native routing features (like OpenShift Routes), you can control traffic between the blue and green environments. When the green version is ready, you switch the traffic to it, making it the new active version.

If you wanted to distribute your blue-green deployments across different regions for high availability or disaster recovery, that would involve setting up multiple clusters in different regions. However, for a standard blue-green deployment in ARO, two regions are not a requirement.

Saturday, February 22, 2025

Is creating deep copy takes more money and cause performance issues?

Creating a deep copy can be slower and consume more memory, which may impact performance, especially for large or complex data structures. However, the exact impact depends on the size and depth of the dictionary.

Performance and Cost Considerations of deepcopy()

1. More Memory Usage

deepcopy() creates new independent objects for everything, increasing memory consumption.

If the dictionary is large, this can be expensive in terms of RAM.

2. Slower Execution

Since deepcopy() recursively copies every element, it takes more CPU time compared to a shallow copy (copy()).

The deeper and larger the structure, the more time it takes.

3. Can Cause Performance Issues in Large-Scale Applications

If you're working with big data, high-frequency operations, or real-time systems, deep copying might slow things down.

Example: Copying a dictionary with millions of nested objects can significantly affect performance.

Alternatives to Improve Performance

Use Shallow Copy (dict.copy()) if you don’t need to modify nested structures.

Manually Copy Only Needed Parts to avoid unnecessary duplication.

Use Immutable Data Structures (like frozenset or dataclasses) to prevent unwanted changes instead of deep copying.

Optimize Data Storage by using references wisely instead of making full copies.

When to Use Deep Copy?

-- If you need a fully independent copy of a dictionary with nested structures.

-- Avoid it for large datasets unless necessary—try shallow copy or restructuring data instead.

Friday, February 21, 2025

Which is better, a Python dictionary or a custom python class from a performance perspective?

From a performance perspective, choosing between a Python dictionary and a custom class depends on several factors, such as memory efficiency, lookup speed, and ease of use. Here’s a breakdown:

1. Using a Dictionary (dict)

Pros:

Fast lookups: Dictionary lookups are O(1) on average due to hash table implementation.

More dynamic: Can easily add/remove keys without modifying code structure.

Memory-efficient for small datasets: Since it only holds keys and values, it can be efficient in some cases.

Cons:

Consumes more memory than simple lists or tuples due to hashing overhead.

Less structured: No strict schema, which can lead to errors when accessing non-existent keys.

Example:

filtered_data = {

"ids": df["id"].tolist(),

"names": df["name"].tolist(),

"values": df["value"].tolist(),

}

2. Using a Custom Class

Pros:

Provides better data encapsulation and type safety.

Improves readability and maintainability when dealing with complex data structures.

Can have methods for data processing, reducing redundant code.

Cons:

Slightly slower lookups compared to dicts (attribute access is O(1), but may involve extra function calls).

Uses more memory due to object overhead.

Example:

class FilteredData:

def __init__(self, df):

self.ids = df["id"].tolist()

self.names = df["name"].tolist()

self.values = df["value"].tolist()

def get_summary(self):

return f"Total records: {len(self.ids)}"

filtered_data = FilteredData(df)

print(filtered_data.get_summary()) # Example method call

Performance Considerations

Conclusion

Use a dict if you need fast, dynamic key-value storage without strict structure.

Use a class if you need structured data representation with encapsulated logic.

If performance is critical, and you're dealing with large datasets, consider using NumPy arrays or Pandas itself, since they are more optimized than Python lists and dictionaries.

How does azure function app with python code updates azure app insights

Azure Application Insights helps monitor logs, exceptions, performance metrics, and telemetry data for Azure Functions. To integrate Python-based Azure Functions with Application Insights, follow these steps:

--------

1. Prerequisites

Azure Function App (Python)

Azure Application Insights Resource

Instrumentation Key or Connection String

-------

2. Enable Application Insights for Azure Function App

Option 1: Enable via Azure CLI

az monitor app-insights component create --app <APP_INSIGHTS_NAME> --resource-group <RESOURCE_GROUP> --location eastus

az functionapp config appsettings set --name <FUNCTION_APP_NAME> --resource-group <RESOURCE_GROUP> \

--settings "APPINSIGHTS_INSTRUMENTATIONKEY=<YOUR_INSTRUMENTATION_KEY>"

OR use the Connection String (recommended for newer versions):

az functionapp config appsettings set --name <FUNCTION_APP_NAME> --resource-group <RESOURCE_GROUP> \

--settings "APPLICATIONINSIGHTS_CONNECTION_STRING=<YOUR_CONNECTION_STRING>"

------

3. Install & Configure Application Insights in Python

Install the Azure Monitor OpenTelemetry SDK:

pip install opentelemetry-sdk opentelemetry-exporter-azure-monitor

Modify your __init__.py to include Application Insights logging:

import logging

import azure.functions as func

from opentelemetry.sdk.trace import TracerProvider

from opentelemetry.sdk.trace.export import SimpleSpanProcessor

from opentelemetry.exporter.azuremonitor import AzureMonitorSpanExporter

# Setup Application Insights Telemetry

instrumentation_key = "<YOUR_INSTRUMENTATION_KEY>" # Or fetch from env variable

tracer = TracerProvider()

tracer.add_span_processor(SimpleSpanProcessor(AzureMonitorSpanExporter.from_connection_string(instrumentation_key)))

def main(req: func.HttpRequest) -> func.HttpResponse:

logging.info("Processing request...")

return func.HttpResponse("Hello from Azure Function with App Insights!")

---

4. Verify Logs in Application Insights

1. Go to Azure Portal → Application Insights

2. Navigate to Logs → Run the following Kusto Query:

traces

| where timestamp > ago(10m)

| order by timestamp desc

3. Check if your function's logs appear.

How to integrate Azure Key Vault for secret management during the GitLab CI/CD process

To integrate Azure Key Vault for secret management during the GitLab CI/CD process, follow these steps:

------

1. Prerequisites

Azure Key Vault Created

Azure CLI installed

Service Principal or Managed Identity with Key Vault access

Secrets stored in Azure Key Vault

------

2. Grant Access to Key Vault

Grant your GitLab Service Principal access to Key Vault secrets:

az keyvault set-policy --name <KEYVAULT_NAME> \

--spn "$AZURE_APP_ID" \

--secret-permissions get list

This allows the Service Principal to read secrets from Key Vault.

------

3. Store Secrets in Key Vault

Store sensitive values in Azure Key Vault:

az keyvault secret set --vault-name <KEYVAULT_NAME> --name "MY_SECRET" --value "my-sensitive-value"

-------

4. Modify .gitlab-ci.yml to Fetch Secrets from Key Vault

Update your GitLab CI/CD pipeline to retrieve secrets securely from Azure Key Vault.

stages:

- deploy

variables:

AZURE_RESOURCE_GROUP: "my-resource-group"

FUNCTION_APP_NAME: "$AZURE_FUNCTIONAPP_NAME"

KEYVAULT_NAME: "<Your-KeyVault-Name>"

deploy_to_azure:

image: mcr.microsoft.com/azure-cli

stage: deploy

script:

- echo "Logging into Azure..."

- az login --service-principal -u "$AZURE_APP_ID" -p "$AZURE_PASSWORD" --tenant "$AZURE_TENANT_ID"

- az account set --subscription "$AZURE_SUBSCRIPTION_ID"

- echo "Fetching secrets from Key Vault..."

- MY_SECRET=$(az keyvault secret show --name "MY_SECRET" --vault-name "$KEYVAULT_NAME" --query "value" -o tsv)

- echo "Setting environment variables for deployment..."

- echo "MY_SECRET=$MY_SECRET" >> .env # If using environment variables

- echo "Deploying Function App..."

- func azure functionapp publish $FUNCTION_APP_NAME

only:

- main

--------

5. Secure Secrets in the Function App

Instead of exposing secrets in GitLab, you can store them in Azure App Configuration dynamically:

az functionapp config appsettings set --name $FUNCTION_APP_NAME --resource-group $AZURE_RESOURCE_GROUP \

--settings "MY_SECRET=@Microsoft.KeyVault(SecretUri=https://$KEYVAULT_NAME.vault.azure.net/secrets/MY_SECRET/)"

This allows Azure Functions to fetch secrets securely at runtime.

-------

6. Verify the Secret in the Function App

After deployment, verify if the secret is correctly injected:

az functionapp config appsettings list --name $FUNCTION_APP_NAME --resource-group $AZURE_RESOURCE_GROUP

How to Deploy a Python Azure Function from GitLab using a CI/CD pipeline

To deploy a Python Azure Function from GitLab using a CI/CD pipeline, follow these steps:

-------

1. Prerequisites

-Azure Subscription and an Azure Function App created

-Azure CLI installed

-GitLab repository with Python function code

-Deployment credentials (Service Principal or Publish Profile)

---

2. Setup Deployment Credentials in GitLab

Option 1: Use Azure Service Principal (Recommended)

1. Create a Service Principal in Azure:

az ad sp create-for-rbac --name "gitlab-deploy" --role contributor --scopes /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>

Output:

{

"appId": "xxxx-xxxx-xxxx-xxxx",

"displayName": "gitlab-deploy",

"password": "xxxx-xxxx-xxxx-xxxx",

"tenant": "xxxx-xxxx-xxxx-xxxx"

}

2. Store these credentials in GitLab CI/CD Variables:

AZURE_APP_ID: xxxx-xxxx-xxxx-xxxx

AZURE_PASSWORD: xxxx-xxxx-xxxx-xxxx

AZURE_TENANT_ID: xxxx-xxxx-xxxx-xxxx

AZURE_SUBSCRIPTION_ID: <Your Azure Subscription ID>

AZURE_FUNCTIONAPP_NAME: <Your Function App Name>

---

3. Create .gitlab-ci.yml for CI/CD Pipeline

Add the following .gitlab-ci.yml file to the root of your repository:

stages:

- deploy

variables:

AZURE_RESOURCE_GROUP: "my-resource-group"

FUNCTION_APP_NAME: "$AZURE_FUNCTIONAPP_NAME"

deploy_to_azure:

image: mcr.microsoft.com/azure-cli

stage: deploy

script:

- echo "Logging into Azure..."

- az login --service-principal -u "$AZURE_APP_ID" -p "$AZURE_PASSWORD" --tenant "$AZURE_TENANT_ID"

- az account set --subscription "$AZURE_SUBSCRIPTION_ID"

- echo "Deploying Function App..."

- func azure functionapp publish $FUNCTION_APP_NAME

only:

- main

This script:

Logs into Azure using Service Principal

Sets the subscription

Deploys the function app from the GitLab repository

---------

4. Enable Git Deployment in Azure

Ensure Git-based deployment is enabled on the Azure Function App:

az functionapp deployment source config --name <YOUR_FUNCTION_APP_NAME> --resource-group <YOUR_RESOURCE_GROUP> --repo-url <YOUR_GITLAB_REPO_URL> --branch main

-------------

5. Commit and Push Changes

git add .gitlab-ci.yml

git commit -m "Added GitLab CI/CD for Azure Function"

git push origin main

This will trigger the pipeline and deploy your function to Azure.

---------

6. Verify Deployment

After deployment, check logs with:

az functionapp log tail --name $FUNCTION_APP_NAME --resource-group $AZURE_RESOURCE_GROUP

Or visit the Azure Portal -> Function App -> "Deployment Center" to verify the latest deployment.

Tuesday, February 18, 2025

Python Important Interview questions for data science

General Python Questions

1. What are Python's key features?

Interpreted and dynamically typed

High-level and easy to read

Extensive standard library

Supports multiple programming paradigms (OOP, Functional, Procedural)

Cross-platform compatibility

Strong community support

2. Explain the difference between deepcopy() and copy().

copy.copy() creates a shallow copy, meaning changes to mutable objects inside the copied object will reflect in the original.

copy.deepcopy() creates a deep copy, meaning all objects are recursively copied, preventing unintended modifications.

3. How does Python manage memory?

Python uses automatic memory management with reference counting and garbage collection.

The garbage collector removes objects that are no longer referenced.

Memory is allocated in private heaps that are managed by the interpreter.

4. What is the difference between is and ==?

is checks object identity (i.e., whether two variables point to the same memory location).

== checks value equality (i.e., whether two variables have the same value).

5. How do Python lists and tuples differ?

Lists are mutable, meaning elements can be modified after creation.

Tuples are immutable, meaning elements cannot be changed.

Lists have more methods and consume slightly more memory than tuples.

---

Data Engineering-Specific Questions

6. How do you handle large datasets efficiently in Python?

Use Dask or Vaex instead of Pandas for parallel computing.

Use chunking when reading large CSV files.

Leverage generators instead of lists to save memory.

Store large datasets in Parquet format instead of CSV for efficiency.

7. What is the difference between Pandas and Dask?

Pandas: Best for small-to-medium datasets; operates in memory.

Dask: Supports parallel processing; can handle large datasets by breaking them into smaller parts.

8. How would you optimize reading a large CSV file in Pandas?

Use chunksize to read the file in smaller parts.

Specify data types (dtype parameter) to reduce memory usage.

Use PyArrow or Vaex for faster I/O operations.

9. How do you use Python for ETL (Extract, Transform, Load) pipelines?

Extract: Read data from sources like APIs, databases, or files (Pandas, SQLAlchemy).

Transform: Clean, filter, and reshape data (Pandas, Dask, PySpark).

Load: Write transformed data to storage (SQL, S3, Azure Blob, Kafka).

10. How would you handle missing data in Pandas?

Use .fillna() to replace missing values.

Use .dropna() to remove rows/columns with missing values.

Use interpolation or statistical methods like mean/median imputation.

---

Azure Cloud & Python Questions

11. How do you use Python to interact with Azure Key Vault?

from azure.identity import DefaultAzureCredential

from azure.keyvault.secrets import SecretClient

key_vault_url = "https://<your-keyvault-name>.vault.azure.net/"

credential = DefaultAzureCredential()

client = SecretClient(vault_url=key_vault_url, credential=credential)

secret = client.get_secret("my-secret-name")

print(secret.value)

12. How would you securely store and retrieve secrets in an Azure environment?

Use Azure Key Vault for secret management.

Authenticate using Managed Identity or Service Principal with RBAC.

Avoid storing secrets in environment variables or code.

13. How do you use Azure Blob Storage with Python?

from azure.storage.blob import BlobServiceClient

connection_string = "your_connection_string"

blob_service_client = BlobServiceClient.from_connection_string(connection_string)

container_client = blob_service_client.get_container_client("my-container")

for blob in container_client.list_blobs():

print(blob.name)

14. What is the role of azure-identity and azure-keyvault-secrets in authentication?

azure-identity: Provides authentication mechanisms like DefaultAzureCredential, Managed Identity, and Service Principal.

azure-keyvault-secrets: Provides secure access to Azure Key Vault secrets.

15. How do you use Python to query Azure SQL Database efficiently?

import pyodbc

conn = pyodbc.connect(

"DRIVER={ODBC Driver 17 for SQL Server};"

"SERVER=tcp:<your-server>.database.windows.net;"

"DATABASE=mydb;"

"UID=myuser;"

"PWD=mypassword"

)

cursor = conn.cursor()

cursor.execute("SELECT * FROM mytable")

rows = cursor.fetchall()

for row in rows:

print(row)

---

Splunk & Python

16. How do you query Splunk using Python?

import requests

url = "https://splunk-server:8089/services/search/jobs"

headers = {"Authorization": "Bearer YOUR_SPLUNK_TOKEN"}

data = {"search": "search index=main | head 10"}

response = requests.post(url, headers=headers, data=data)

print(response.json())

17. What is the difference between using REST API vs. SDK for querying Splunk?

REST API: Gives raw access to Splunk services via HTTP requests.

Splunk SDK (splunk-sdk-python): Provides Python-friendly functions and better integration with applications.

18. How do you authenticate Python scripts with Splunk securely?

Use OAuth tokens instead of storing credentials in code.

Implement environment variables or Azure Key Vault for secret management.

Use role-based access control (RBAC) in Splunk.

19. What are some common use cases for integrating Splunk with Python?

Log analysis: Automate log searching and filtering.

Alerting & monitoring: Trigger alerts based on log patterns.

Security & anomaly detection: Detect security incidents.

Data visualization: Export data to Pandas for analysis.

20. How do you filter and process Splunk logs using Pandas?

import pandas as pd

# Sample Splunk JSON response

logs = [

{"timestamp": "2025-02-19T10:00:00Z", "status": 200, "message": "OK"},

{"timestamp": "2025-02-19T10:05:00Z", "status": 500, "message": "Internal Server Error"},

]

df = pd.DataFrame(logs)

df["timestamp"] = pd.to_datetime(df["timestamp"])

errors = df[df["status"] >= 500]

print(errors)

---