Scenario
A CSV file contains customer records that need to be imported into a database. Some records may already exist in the database and should not be duplicated.
Task
Write a Python script at /home/interview/insert_new_records.py that reads /home/interview/new_customers.csv, checks if each record already exists in the customers table of /home/interview/customers.db (based on the id field), and inserts only the new records.
Step 1: Examine the database and CSV
sqlite3 /home/interview/customers.db "SELECT COUNT(*) FROM customers;"
wc -l /home/interview/new_customers.csv
Shows 200 records in database and 301 lines in CSV (including header).
Step 2: Create the Python script
nano /home/interview/insert_new_records.py
Write a script that checks for existing records and inserts only new ones:
import sqlite3
import pandas as pd
# Connect to database
conn = sqlite3.connect('/home/interview/customers.db')
cursor = conn.cursor()
# Read CSV file
df = pd.read_csv('/home/interview/new_customers.csv')
# Get existing IDs from database
cursor.execute('SELECT id FROM customers')
existing_ids = set(row[0] for row in cursor.fetchall())
# Filter for new records only
new_records = df[~df['id'].isin(existing_ids)]
# Insert new records
for _, row in new_records.iterrows():
cursor.execute('''
INSERT INTO customers (id, first_name, last_name, email, phone, city, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?)
''', (row['id'], row['first_name'], row['last_name'],
row['email'], row['phone'], row['city'], row['created_at']))
conn.commit()
conn.close()
print(f"Inserted {len(new_records)} new records")
The script fetches existing IDs, filters the CSV to exclude them, and inserts only new records.
Step 3: Run the script
python3 /home/interview/insert_new_records.py
Step 4: Verify the database
sqlite3 /home/interview/customers.db "SELECT COUNT(*) FROM customers;"
Should show 300 records (200 original + 100 new).