1

I want to convert a set of yaml files in a folder into an xlsx file. I thought I'd start with trying to convert one yaml file into an xlsx file. The yaml files in the folder are all in the format given below:

info:
    city: Bangalore
    competition: IPL
    dates:
       - 2008-04-18
    gender: male
    match_type: T20
    outcome:
           by:
              runs: 140
           winner: Kolkata Knight Riders
    overs: 20
    player_of_match:
        - BB McCullum
    teams:
        - Royal Challengers Bangalore
        - Kolkata Knight Riders
    toss:
        decision: field
    winner: Royal Challengers Bangalore
    umpires:
        - Asad Rauf
        - RE Koertzen
    venue: M Chinnaswamy Stadium
    innings:
        - 1st innings:
        team: Kolkata Knight Riders
        deliveries:
                     - 0.1:
                           batsman: SC Ganguly
                           bowler: P Kumar
                           extras:
                                  legbyes: 1
                           non_striker: BB McCullum
                           runs:
                                batsman: 0
                                extras: 1
                           total: 1

The data continues for each ball of the match (0.2, 0.3, 0.4 ... 20.0) and shifts to the second half of the game (second innings) and continues further

My attempt at converting one of these yaml file into an xlsx file:

import pandas as pd
import yaml as ya
with open(r"location of folder") as f:
data = ya.load(f, Loader=ya.FullLoader)
df1=pd.DataFrame(data['info'])
df1.to_excel(r"location of folder\output.xlsx")

However, after running the above code, I got the following errors:

File "c:\Users\kosal\hello\prj.py", line 8, in <module>
    df1=pd.DataFrame(data['info'])
  File "C:\Users\kosal\anaconda3\lib\site-packages\pandas\core\frame.py", line 529, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "C:\Users\kosal\anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 287, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "C:\Users\kosal\anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 80, in arrays_to_mgr
    index = extract_index(arrays)
  File "C:\Users\kosal\anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 401, in extract_index
    raise ValueError("arrays must all be same length")

I do realize why this error is coming up but I have no idea as to how I should go about fixing it.

P.S. I can't find an appropriate tag for this question and hence have used the 'python' tag.

Seshank K
  • 11
  • 1

0 Answers0