Saturday, February 22, 2025

Is creating deep copy takes more money and cause performance issues?

Creating a deep copy can be slower and consume more memory, which may impact performance, especially for large or complex data structures. However, the exact impact depends on the size and depth of the dictionary.

Performance and Cost Considerations of deepcopy()

1. More Memory Usage

deepcopy() creates new independent objects for everything, increasing memory consumption.

If the dictionary is large, this can be expensive in terms of RAM.

2. Slower Execution

Since deepcopy() recursively copies every element, it takes more CPU time compared to a shallow copy (copy()).

The deeper and larger the structure, the more time it takes.

3. Can Cause Performance Issues in Large-Scale Applications

If you're working with big data, high-frequency operations, or real-time systems, deep copying might slow things down.

Example: Copying a dictionary with millions of nested objects can significantly affect performance.

Alternatives to Improve Performance

Use Shallow Copy (dict.copy()) if you don’t need to modify nested structures.

Manually Copy Only Needed Parts to avoid unnecessary duplication.

Use Immutable Data Structures (like frozenset or dataclasses) to prevent unwanted changes instead of deep copying.

Optimize Data Storage by using references wisely instead of making full copies.

When to Use Deep Copy?

-- If you need a fully independent copy of a dictionary with nested structures.
-- Avoid it for large datasets unless necessary—try shallow copy or restructuring data instead.

Friday, February 21, 2025

Which is better, a Python dictionary or a custom python class from a performance perspective?

From a performance perspective, choosing between a Python dictionary and a custom class depends on several factors, such as memory efficiency, lookup speed, and ease of use. Here’s a breakdown:

1. Using a Dictionary (dict)

Pros:

Fast lookups: Dictionary lookups are O(1) on average due to hash table implementation.

More dynamic: Can easily add/remove keys without modifying code structure.

Memory-efficient for small datasets: Since it only holds keys and values, it can be efficient in some cases.

Cons:

Consumes more memory than simple lists or tuples due to hashing overhead.

Less structured: No strict schema, which can lead to errors when accessing non-existent keys.

Example:

filtered_data = {
    "ids": df["id"].tolist(),
    "names": df["name"].tolist(),
    "values": df["value"].tolist(),
}

2. Using a Custom Class

Pros:

Provides better data encapsulation and type safety.

Improves readability and maintainability when dealing with complex data structures.

Can have methods for data processing, reducing redundant code.


Cons:

Slightly slower lookups compared to dicts (attribute access is O(1), but may involve extra function calls).

Uses more memory due to object overhead.

Example:

class FilteredData:
    def __init__(self, df):
        self.ids = df["id"].tolist()
        self.names = df["name"].tolist()
        self.values = df["value"].tolist()
    
    def get_summary(self):
        return f"Total records: {len(self.ids)}"

filtered_data = FilteredData(df)
print(filtered_data.get_summary())  # Example method call

Performance Considerations

Conclusion

Use a dict if you need fast, dynamic key-value storage without strict structure.

Use a class if you need structured data representation with encapsulated logic.


If performance is critical, and you're dealing with large datasets, consider using NumPy arrays or Pandas itself, since they are more optimized than Python lists and dictionaries.

How does azure function app with python code updates azure app insights

Azure Application Insights helps monitor logs, exceptions, performance metrics, and telemetry data for Azure Functions. To integrate Python-based Azure Functions with Application Insights, follow these steps:
--------

1. Prerequisites

Azure Function App (Python)

Azure Application Insights Resource

Instrumentation Key or Connection String

-------

2. Enable Application Insights for Azure Function App

Option 1: Enable via Azure CLI

az monitor app-insights component create --app <APP_INSIGHTS_NAME> --resource-group <RESOURCE_GROUP> --location eastus
az functionapp config appsettings set --name <FUNCTION_APP_NAME> --resource-group <RESOURCE_GROUP> \
    --settings "APPINSIGHTS_INSTRUMENTATIONKEY=<YOUR_INSTRUMENTATION_KEY>"

OR use the Connection String (recommended for newer versions):

az functionapp config appsettings set --name <FUNCTION_APP_NAME> --resource-group <RESOURCE_GROUP> \
    --settings "APPLICATIONINSIGHTS_CONNECTION_STRING=<YOUR_CONNECTION_STRING>"

------

3. Install & Configure Application Insights in Python

Install the Azure Monitor OpenTelemetry SDK:

pip install opentelemetry-sdk opentelemetry-exporter-azure-monitor

Modify your __init__.py to include Application Insights logging:

import logging
import azure.functions as func
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.exporter.azuremonitor import AzureMonitorSpanExporter

# Setup Application Insights Telemetry
instrumentation_key = "<YOUR_INSTRUMENTATION_KEY>"  # Or fetch from env variable
tracer = TracerProvider()
tracer.add_span_processor(SimpleSpanProcessor(AzureMonitorSpanExporter.from_connection_string(instrumentation_key)))

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info("Processing request...")
    return func.HttpResponse("Hello from Azure Function with App Insights!")

---

4. Verify Logs in Application Insights

1. Go to Azure Portal → Application Insights


2. Navigate to Logs → Run the following Kusto Query:

traces
| where timestamp > ago(10m)
| order by timestamp desc


3. Check if your function's logs appear.

How to integrate Azure Key Vault for secret management during the GitLab CI/CD process

To integrate Azure Key Vault for secret management during the GitLab CI/CD process, follow these steps:

------

1. Prerequisites

Azure Key Vault Created

Azure CLI installed

Service Principal or Managed Identity with Key Vault access

Secrets stored in Azure Key Vault

------

2. Grant Access to Key Vault

Grant your GitLab Service Principal access to Key Vault secrets:

az keyvault set-policy --name <KEYVAULT_NAME> \
    --spn "$AZURE_APP_ID" \
    --secret-permissions get list

This allows the Service Principal to read secrets from Key Vault.

------

3. Store Secrets in Key Vault

Store sensitive values in Azure Key Vault:

az keyvault secret set --vault-name <KEYVAULT_NAME> --name "MY_SECRET" --value "my-sensitive-value"

-------

4. Modify .gitlab-ci.yml to Fetch Secrets from Key Vault

Update your GitLab CI/CD pipeline to retrieve secrets securely from Azure Key Vault.

stages:
  - deploy

variables:
  AZURE_RESOURCE_GROUP: "my-resource-group"
  FUNCTION_APP_NAME: "$AZURE_FUNCTIONAPP_NAME"
  KEYVAULT_NAME: "<Your-KeyVault-Name>"

deploy_to_azure:
  image: mcr.microsoft.com/azure-cli
  stage: deploy
  script:
    - echo "Logging into Azure..."
    - az login --service-principal -u "$AZURE_APP_ID" -p "$AZURE_PASSWORD" --tenant "$AZURE_TENANT_ID"
    - az account set --subscription "$AZURE_SUBSCRIPTION_ID"

    - echo "Fetching secrets from Key Vault..."
    - MY_SECRET=$(az keyvault secret show --name "MY_SECRET" --vault-name "$KEYVAULT_NAME" --query "value" -o tsv)

    - echo "Setting environment variables for deployment..."
    - echo "MY_SECRET=$MY_SECRET" >> .env  # If using environment variables

    - echo "Deploying Function App..."
    - func azure functionapp publish $FUNCTION_APP_NAME
  only:
    - main

--------

5. Secure Secrets in the Function App

Instead of exposing secrets in GitLab, you can store them in Azure App Configuration dynamically:

az functionapp config appsettings set --name $FUNCTION_APP_NAME --resource-group $AZURE_RESOURCE_GROUP \
    --settings "MY_SECRET=@Microsoft.KeyVault(SecretUri=https://$KEYVAULT_NAME.vault.azure.net/secrets/MY_SECRET/)"

This allows Azure Functions to fetch secrets securely at runtime.

-------

6. Verify the Secret in the Function App

After deployment, verify if the secret is correctly injected:

az functionapp config appsettings list --name $FUNCTION_APP_NAME --resource-group $AZURE_RESOURCE_GROUP

How to Deploy a Python Azure Function from GitLab using a CI/CD pipeline

To deploy a Python Azure Function from GitLab using a CI/CD pipeline, follow these steps:
-------

1. Prerequisites

-Azure Subscription and an Azure Function App created
-Azure CLI installed
-GitLab repository with Python function code
-Deployment credentials (Service Principal or Publish Profile)

---

2. Setup Deployment Credentials in GitLab

Option 1: Use Azure Service Principal (Recommended)

1. Create a Service Principal in Azure:

az ad sp create-for-rbac --name "gitlab-deploy" --role contributor --scopes /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>

Output:

{
    "appId": "xxxx-xxxx-xxxx-xxxx",
    "displayName": "gitlab-deploy",
    "password": "xxxx-xxxx-xxxx-xxxx",
    "tenant": "xxxx-xxxx-xxxx-xxxx"
}


2. Store these credentials in GitLab CI/CD Variables:

AZURE_APP_ID: xxxx-xxxx-xxxx-xxxx

AZURE_PASSWORD: xxxx-xxxx-xxxx-xxxx

AZURE_TENANT_ID: xxxx-xxxx-xxxx-xxxx

AZURE_SUBSCRIPTION_ID: <Your Azure Subscription ID>

AZURE_FUNCTIONAPP_NAME: <Your Function App Name>

---

3. Create .gitlab-ci.yml for CI/CD Pipeline

Add the following .gitlab-ci.yml file to the root of your repository:

stages:
  - deploy

variables:
  AZURE_RESOURCE_GROUP: "my-resource-group"
  FUNCTION_APP_NAME: "$AZURE_FUNCTIONAPP_NAME"

deploy_to_azure:
  image: mcr.microsoft.com/azure-cli
  stage: deploy
  script:
    - echo "Logging into Azure..."
    - az login --service-principal -u "$AZURE_APP_ID" -p "$AZURE_PASSWORD" --tenant "$AZURE_TENANT_ID"
    - az account set --subscription "$AZURE_SUBSCRIPTION_ID"
    
    - echo "Deploying Function App..."
    - func azure functionapp publish $FUNCTION_APP_NAME
  only:
    - main

This script:

Logs into Azure using Service Principal

Sets the subscription

Deploys the function app from the GitLab repository

---------

4. Enable Git Deployment in Azure

Ensure Git-based deployment is enabled on the Azure Function App:

az functionapp deployment source config --name <YOUR_FUNCTION_APP_NAME> --resource-group <YOUR_RESOURCE_GROUP> --repo-url <YOUR_GITLAB_REPO_URL> --branch main

-------------

5. Commit and Push Changes

git add .gitlab-ci.yml
git commit -m "Added GitLab CI/CD for Azure Function"
git push origin main

This will trigger the pipeline and deploy your function to Azure.

---------

6. Verify Deployment

After deployment, check logs with:

az functionapp log tail --name $FUNCTION_APP_NAME --resource-group $AZURE_RESOURCE_GROUP

Or visit the Azure Portal -> Function App -> "Deployment Center" to verify the latest deployment.

Tuesday, February 18, 2025

Python Important Interview questions for data science

General Python Questions

1. What are Python's key features?

Interpreted and dynamically typed

High-level and easy to read

Extensive standard library

Supports multiple programming paradigms (OOP, Functional, Procedural)

Cross-platform compatibility

Strong community support



2. Explain the difference between deepcopy() and copy().

copy.copy() creates a shallow copy, meaning changes to mutable objects inside the copied object will reflect in the original.

copy.deepcopy() creates a deep copy, meaning all objects are recursively copied, preventing unintended modifications.



3. How does Python manage memory?

Python uses automatic memory management with reference counting and garbage collection.

The garbage collector removes objects that are no longer referenced.

Memory is allocated in private heaps that are managed by the interpreter.



4. What is the difference between is and ==?

is checks object identity (i.e., whether two variables point to the same memory location).

== checks value equality (i.e., whether two variables have the same value).



5. How do Python lists and tuples differ?

Lists are mutable, meaning elements can be modified after creation.

Tuples are immutable, meaning elements cannot be changed.

Lists have more methods and consume slightly more memory than tuples.





---

Data Engineering-Specific Questions

6. How do you handle large datasets efficiently in Python?

Use Dask or Vaex instead of Pandas for parallel computing.

Use chunking when reading large CSV files.

Leverage generators instead of lists to save memory.

Store large datasets in Parquet format instead of CSV for efficiency.



7. What is the difference between Pandas and Dask?

Pandas: Best for small-to-medium datasets; operates in memory.

Dask: Supports parallel processing; can handle large datasets by breaking them into smaller parts.



8. How would you optimize reading a large CSV file in Pandas?

Use chunksize to read the file in smaller parts.

Specify data types (dtype parameter) to reduce memory usage.

Use PyArrow or Vaex for faster I/O operations.



9. How do you use Python for ETL (Extract, Transform, Load) pipelines?

Extract: Read data from sources like APIs, databases, or files (Pandas, SQLAlchemy).

Transform: Clean, filter, and reshape data (Pandas, Dask, PySpark).

Load: Write transformed data to storage (SQL, S3, Azure Blob, Kafka).



10. How would you handle missing data in Pandas?



Use .fillna() to replace missing values.

Use .dropna() to remove rows/columns with missing values.

Use interpolation or statistical methods like mean/median imputation.



---

Azure Cloud & Python Questions

11. How do you use Python to interact with Azure Key Vault?



from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

key_vault_url = "https://<your-keyvault-name>.vault.azure.net/"
credential = DefaultAzureCredential()
client = SecretClient(vault_url=key_vault_url, credential=credential)

secret = client.get_secret("my-secret-name")
print(secret.value)

12. How would you securely store and retrieve secrets in an Azure environment?



Use Azure Key Vault for secret management.

Authenticate using Managed Identity or Service Principal with RBAC.

Avoid storing secrets in environment variables or code.


13. How do you use Azure Blob Storage with Python?



from azure.storage.blob import BlobServiceClient

connection_string = "your_connection_string"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
container_client = blob_service_client.get_container_client("my-container")

for blob in container_client.list_blobs():
    print(blob.name)

14. What is the role of azure-identity and azure-keyvault-secrets in authentication?



azure-identity: Provides authentication mechanisms like DefaultAzureCredential, Managed Identity, and Service Principal.

azure-keyvault-secrets: Provides secure access to Azure Key Vault secrets.


15. How do you use Python to query Azure SQL Database efficiently?



import pyodbc

conn = pyodbc.connect(
    "DRIVER={ODBC Driver 17 for SQL Server};"
    "SERVER=tcp:<your-server>.database.windows.net;"
    "DATABASE=mydb;"
    "UID=myuser;"
    "PWD=mypassword"
)

cursor = conn.cursor()
cursor.execute("SELECT * FROM mytable")
rows = cursor.fetchall()
for row in rows:
    print(row)


---

Splunk & Python

16. How do you query Splunk using Python?



import requests

url = "https://splunk-server:8089/services/search/jobs"
headers = {"Authorization": "Bearer YOUR_SPLUNK_TOKEN"}
data = {"search": "search index=main | head 10"}
response = requests.post(url, headers=headers, data=data)
print(response.json())

17. What is the difference between using REST API vs. SDK for querying Splunk?



REST API: Gives raw access to Splunk services via HTTP requests.

Splunk SDK (splunk-sdk-python): Provides Python-friendly functions and better integration with applications.


18. How do you authenticate Python scripts with Splunk securely?



Use OAuth tokens instead of storing credentials in code.

Implement environment variables or Azure Key Vault for secret management.

Use role-based access control (RBAC) in Splunk.


19. What are some common use cases for integrating Splunk with Python?



Log analysis: Automate log searching and filtering.

Alerting & monitoring: Trigger alerts based on log patterns.

Security & anomaly detection: Detect security incidents.

Data visualization: Export data to Pandas for analysis.


20. How do you filter and process Splunk logs using Pandas?



import pandas as pd

# Sample Splunk JSON response
logs = [
    {"timestamp": "2025-02-19T10:00:00Z", "status": 200, "message": "OK"},
    {"timestamp": "2025-02-19T10:05:00Z", "status": 500, "message": "Internal Server Error"},
]

df = pd.DataFrame(logs)
df["timestamp"] = pd.to_datetime(df["timestamp"])
errors = df[df["status"] >= 500]
print(errors)


---



Related Posts Plugin for WordPress, Blogger...