Raw data gets generated daily We need to get success rate for current month on a daily basis & success rate of each cell by end of that month from Jan till current month.
Raw data has following columns:
Column Names example values
BackupClient an4lsbk0304.an.ppp.com
BackupDriver NetBackup
BackupMaster an4lsbk0300.an.ppp.com
BackupPolicyID an.72699
BackupPolicyName ancfsv02a_aggr1_ancfs02n02b_L_an_vlw_atom_001
Cell an
LastFullResult Failure
LastFullStartTime 4-10-19
PolicyType NFS
Status Active
LastFullExitCode 96
Each column – “Cell” I need to find following result columns:
Oct Full ran, failed & not completed yet
Oct Full not ran
Oct full ran successful
Grand Total
Success rate for full ran in Oct
Success rate of full backup
Percentage of backup coverage
Oct Full ran, failed & not completed yet: Oct full ran, failed & not completed yet means – “LastFullStartTime” – contains current month date and non-empty && “LastFullResult” – Failed && “status code” – greater than 1
Oct Full not ran yet:
Oct Full not ran means – “LastFullStartTime” - empty or date older than current month
Oct full ran successful – Oct full ran successful means – “LastFullStartTime” – contains current month date && “LastFullResult” – success
Grand Total Grand Total means – Count of BackupPolicyID for each distinct cell; should be ideally equal to above 3 columns (1+2+3=4)
Success rate for full ran in Oct Success rate for full ran in Oct means – Above column1/(column1+column3) in percentage
Success rate of full backup Success rate of full backup means – Above column1/(column1+column2+ column3) in percentage
Percentage of backup coverage Percentage of backup coverage means – Above (column1+column3)/column4
code:
import pandas as pd
import os
RD = pd.read_csv("C:/Users/acharbha/Desktop/fullbackup_success/python/raw_Data_success_Rate.csv")
print(RD.info())